Skip to content

feat: 3D graph, TanStack Query, parallel Monte Carlo, API hardening#2

Merged
showjihyun merged 51 commits into
mainfrom
feat/graph-3d
Apr 11, 2026
Merged

feat: 3D graph, TanStack Query, parallel Monte Carlo, API hardening#2
showjihyun merged 51 commits into
mainfrom
feat/graph-3d

Conversation

@showjihyun

@showjihyun showjihyun commented Apr 8, 2026

Copy link
Copy Markdown
Owner

Summary

3D Graph Visualization

  • Replace 2D Cytoscape canvas with WebGL/three.js 3D renderer via react-force-graph-3d
  • Community-colored nodes and edges, instanced sphere rendering, auto-scaled resolution
  • Orbit/zoom/pan controls with physics settle

TanStack Query Migration

  • Centralized query/mutation layer (queries.ts) with 30+ hooks
  • Request deduplication, cross-route caching, background revalidation
  • All pages migrated from direct apiClient calls

Parallel Monte Carlo

  • Concurrent execution via asyncio.Semaphore (configurable max_concurrency)
  • Real per-community adoption tracking (replaces global average)
  • Memory leak fixed: orchestrator state cleaned up after each run

API Error Handling Hardening

  • 7 broad except HTTPException catches replaced with 404-only filters
  • Historical sims (post-restart) return empty data instead of 404
  • Ghost simulation bug fixed: sim only starts after DB persistence confirmed
  • Replay endpoint properly returns 500 on failure (was returning fake replay_id)

Engine Improvements

  • Personality drift system with cumulative tracking and MAX_DRIFT cap
  • Campaign controversy parameter wired through agent tick
  • Real intra/inter-community edge counting for cascade detection
  • O(n²) → O(n) community metrics via pre-bucketing
  • Community link counting offloaded to thread pool

DX & Docs

  • Central glossary + HelpTooltip for technical terms
  • CampaignSetupPage split (617 → 111 lines + 6 sub-components)
  • SimulationListPage at /simulation
  • Git branch strategy docs, contributor improvements, issue/PR templates
  • 3 new E2E test files

Pre-Landing Review

12 issues found (5 critical, 7 informational) — all 5 critical issues fixed:

  1. Broad except HTTPException catches → filtered to 404 only (7 locations)
  2. Monte Carlo memory leak → orchestrator cleanup added
  3. O(n²) community metrics → O(n) pre-bucketing
  4. Ghost simulation bug → start after DB confirmation
  5. Blocking event loop → thread pool offload
  6. MC return type annotation corrected

Test plan

  • Backend tests pass (861 passed, 2 skipped)
  • Frontend tests pass (525 passed, 27 test files)
  • All review fixes verified against test suite

🤖 Generated with Claude Code


Documentation

Session doc audit (2026-04-11) — committed in 0d87478.

Files updated

  • README.md — test badge 961|5211002|656; Tech Stack table pytest (961), Vitest (521)pytest (1,002), Vitest (656); "What's working today" 1,482+ automated tests (961 backend + 521 frontend)1,658+ automated tests (1,002 backend + 656 frontend).
  • CLAUDE.md — "Total tests" line updated to 1,658+ GREEN (Backend 1,002 + Frontend 656); backend/frontend run-output lines updated to the current numbers (40 files frontend, 2 skipped backend).
  • CONTRIBUTING.md — "Don't break existing tests — all 1,234+ must stay green" → 1,658+.
  • CHANGELOG.md — populated the ## [Unreleased] section (it was empty) with the session's Added/Changed/Fixed entries: shared diffusion calibration module, community_name field, propagation utilities, LICENSE switch, dynamic community palette, agent label fix, graph overlay rearrangement, lifespan split, graph propagation animation fix, low-centrality propagation restore, deadlock root-cause narrowing, get_agent O(1) cache, regression test rewrite. [0.1.1.0] section left untouched.

Not modified

  • ARCHITECTURE.md — doesn't exist in this repo.
  • TODOS.md — doesn't exist in this repo.
  • VERSION — stays at 0.1.1.0. User chose to keep session fixes in [Unreleased] rather than bump or fold.
  • Historical CHANGELOG entries ([0.1.1.0], [0.1.0.0], [0.1.0]) — preserved verbatim per skill rules.

Verified

  • Frontend npx vitest run — 656/656 pass (40 files).
  • Backend uv run pytest tests/ — 1,002 pass, 2 skip.

Ship Log (2026-04-11)

Consolidation commit 3e3ba11 pushed on top of the existing branch. 106 files, +12,054/-900. This bundles two logical layers of work.

Prior-session work (clean-architecture refactor + features)

  • Repositories + Services layersbackend/app/repositories/ (simulation_repo, project_repo, memory_repo, protocols, simulation_persistence) and backend/app/services/ (simulation_service, community_opinion_service, ports). Session lifecycle is now owned at the service boundary instead of deep inside the engine.
  • Conversation threadsthread_capture.py, ThreadMessageRow ORM, test_22_conversation_threads.py covering the full capture → storage → API pipeline.
  • Expert LLM engine23_EXPERT_LLM_SPEC tests in test_23_expert_llm.py.
  • Community opinion featurecommunity_opinion model + service + API + frontend panels + migration e1_community_opinion.
  • LLM cache observabilitytest_24_cache_observability covering vcache hit path + tier distribution.
  • Frontend component split-outs — DecidePanel, EmergentEventsPanel, EliteLLMNarrativePanel, OverallOpinionPanel, FormProgressBanner, SimilarityWarningBanner, WorkflowStepper, GraphLegend, ZoomTierBadge all extracted with matching test files.
  • Structural test guardsArchitectureInvariants.test.ts + communitySimilarity.test.ts.
  • Test reorganizationtest_21_simulation_quality split into _p1 + _p2 + test_21_memory_pgvector; added test_25_simulation_service + test_26_community_opinion.
  • Migrationd1_bigint_random_seed widens random_seed to bigint so values outside int32 don't overflow.

This session's fixes (2026-04-11)

  1. 🔧 Graph propagation animation restored. Particles weren't drawing during live sims because GraphPanel built activePropLinksRef keys from agent UUIDs while linkDirectionalParticles looked them up by graph node_ids. Translation now lives in propagationAnimationUtils.ts (buildAgentIdToNodeId, buildActivePropLinks) and the real utility is exercised by 5 regression tests — a copy in the test file would have passed silently while the component broke.

  2. 🔧 Low-centrality agents propagate again. InfluenceLayer (the path the agent tick uses) was missing the Round 7-d influence floor and sigmoid emotion smoothing that PropagationModel already had. Typical agents (influence ≈ 0.04–0.1 on small graphs, balanced ef) were producing ~0.2% per-target probability → empty propagation_pairs every step. Both paths now call propagation_calibration.propagation_probability() so there's exactly one file to edit when the calibration is tuned again. Two new regression tests (test_round_7d_low_influence_agents_still_propagate, test_round_7d_negative_emotion_factor_still_propagates) guard against silent drift.

  3. 🔧 Startup deadlock on GET /api/v1/projects/ narrowed by splitting the lifespan into two short-lived transactions with SET LOCAL lock_timeout = '10s', and skipping metadata.create_all entirely when alembic_version is present. Production boot never holds DDL locks on user tables.

  4. community_name field on agent detail. AgentDetailResponse gained a str | None community_name, resolved via a cached community_uuid → cc.name map in SimulationOrchestrator._community_name_map(). Was O(N) graph walk per inspector click, now O(1).

  5. 🎨 Dynamic community palette in 3D graph. Derived from the live graph instead of the hardcoded A/B/C/D/E profile — sims with custom ids ("mainstream", "skeptics", etc.) get real colors. Fallback colors are hashed from the community id so "mainstream" always picks the same slot across re-fetches. Node labels use the graph node_id (Agent #42) instead of the first-8-chars of the deterministic UUID (which was identical for every agent). AgentInspector shows the resolved human name instead of a raw UUID.

  6. 🎨 Graph overlay layout. 3D Controls hint moved to bottom-right, community legend raised 200px, full GraphLegend overlay gained a bottomOffsetPx prop so it stacks above the controls hint without hardcoded magic numbers.

  7. 📜 LICENSE: Apache-2.0 → MIT with a project tagline header. Commercial use, forking, embedding, and downstream redistribution all stay simple. (Committed earlier as 742e2c7.)

Verification evidence

Check Result
Backend uv run pytest tests/ 1,002 passed, 2 skipped (exit 0, 1,416s)
Frontend npx vitest run 656 passed (40 files, 20.6s)
TypeScript npx tsc -b Clean on all touched files (2 pre-existing baseline errors in communitySimilarity.test.ts + 1 in DecidePanel.tsx are unrelated to session changes)
Live end-to-end Fresh sim produces non-empty propagation_pairs; every UUID resolves to a valid graph node_id; /agents/{id} returns community_name: "Alpha"
Concurrent load on /api/v1/projects/ 30×200 OK, no deadlocks

Self-review pass applied

Full self-review ran with the code-review-excellence skill. All 🔴 blocking items and 🟡 important items were addressed:

  • Regression tests call the real utility (not a copy)
  • Round-7d formula deduplicated into one module
  • Low-influence + negative-emotion regression tests added
  • get_agent community lookup cached
  • main.py deadlock docstring softened (root cause not confirmed, only narrowed)
  • Stable palette color hash
  • LEFT_LEGEND_OFFSET_PX extracted
  • AgentInspector no font swap
  • Orphan doc/test claims in README/CLAUDE.md refreshed to match actual counts

Known gh CLI quirk

gh pr edit --body-file fails on this gh version with GraphQL: Projects (classic) is being deprecated (exit 1). Worked around with gh api PATCH repos/.../pulls/2 --input payload.json. Upgrade gh to a post-May-2024 build to drop the projectCards GraphQL call.


Ship Log (2026-04-11, session 2)

Four commits on top of the prior ship log bringing opinion synthesis + calibration to full E2E verification.

What landed

  1. ✨ Cross-community opinion synthesis (R8-2 extension). The R8-2 per-community endpoint already existed; this session added the cross-community aggregate. POST /api/v1/simulations/{sim}/communities/__overall__/opinion-summary synthesises each community as a side-effect then rolls up into a headline narrative. Returns { overall, communities: [...] } in one round-trip. New OverallOpinionPanel component mounted on ScenarioOpinionsPage with per-community collapsible breakdown.

  2. 🔧 Diffusion calibration strengthened (R8-3). MessageStrength.score was 0.4·u + 0.4·n − 0.4·c + 0.5 — spread was 0.94 vs 0.58 (1.62×) and the worst case saturated at 0.10, which meant "stuck at 12%" scenarios couldn't emerge from campaign design alone. Reformulated to 0.6·u + 0.5·n − 0.7·c + 0.3: spread now 0.86 vs 0.31 (2.77×) and the worst case saturates at 0.0, giving the propagation multiplier real headroom to stall. Docstring rewritten to match the actual math (the prior one contradicted the code). 5 parametric tests in test_01_schema.py pin the new coefficients.

  3. 🔧 Ollama stack swapped for VRAM-friendly host (R8-4).

    • Default model: gemma4:latest (9.6 GB, multimodal) → llama3.2:1b (1.3 GB, text). User reported their PC VRAM wasn't enough for gemma4.
    • Ollama image: latest (0.20.x) → pinned to 0.11.10. The 0.20.x series has a llama-runner regression that crashes CPU inference on Ryzen 7500F with "llama runner process has terminated" and no stack trace. Pre-regression build runs cleanly.
    • Aligned across backend/app/config.py, backend/.env.example, docker-compose.yml, frontend/src/config/constants.ts, frontend/src/pages/SettingsPage.tsx, and three test files.
    • Removed unused model blobs (gemma:latest, gemma2:latest, gemma3:latest, phi3:mini) — freed ~15.9 GB.
  4. 🔒 Opinion cache race fix (review C1). community_opinions had non-unique indices but no UNIQUE constraint, so two concurrent requests for the same (sim_id, community_id, step) both missed _find_cached and both paid for a real Tier-3 LLM call — the cache contract was advisory. Added migration e2_community_opinion_unique with UniqueConstraint("simulation_id", "community_id", "step"). _persist_row_with_retry now catches IntegrityError with sqlstate=23505, rolls back, and re-fetches the winner's row via _find_cached so the loser's call returns the canonical existing row instead of retrying a doomed insert. ORM model carries the constraint for consistency. Two new tests (test_unique_violation_returns_winner_row, test_unique_violation_no_winner_row_propagates) cover the race path.

  5. 🔒 LLM structured-output shape guards (review C2). _parse_response was normalising sentiment_trend and clipping summary but themes, divisions, key_quotes went to JSONB untouched. Small LLMs (llama3.2:1b especially) routinely return single strings, dicts, or None where the schema says "list of objects", and the frontend .map() calls were crashing on render. Added three normaliser helpers (_normalise_themes, _normalise_divisions, _normalise_key_quotes) that drop any non-dict elements, require the key fields (theme, faction, agent_id + content), coerce numeric values with safe defaults, clamp strings to column limits, and clamp weight/share to [0, 1]. Seven new parametric tests cover garbage list elements, non-list inputs, missing fields, out-of-range values, and non-string concerns.

  6. 🔧 Non-deadlock rollback gap (review I1). _persist_row_with_retry raised on non-deadlock errors without rolling back, leaving the session in a dirty state the caller couldn't reuse. Now rolls back before re-raising.

  7. 📜 Frontend default model drift (review C3). Backend moved to llama3.2:1b but constants.ts (DEFAULT_OLLAMA_MODEL), SettingsPage.tsx (useState defaults), and two test files still hardcoded gemma4:latest. Aligned all five references.

Verification evidence

Check Result
Backend uv run pytest tests/ 1,028 passed, 2 skipped (+26 opinion tests this session)
Frontend npx vitest run 656 passed (40 files)
Frontend npx tsc --noEmit 0 errors
ESLint on touched files 0 errors, 0 warnings
Live E2E — per-community synthesis Real llama3.2:1b response, no stub, real parsed JSON
Live E2E — cross-community synthesis 5 communities + 1 aggregate call in ~67s, all non-stub
UNIQUE constraint live verification Sequential call 2 returns same opinion_id, DB has exactly 1 row per (sim, community, step)

Files changed this session

  • backend/migrations/versions/e2_community_opinion_unique.py (new)
  • backend/app/models/community_opinion.py (+UniqueConstraint)
  • backend/app/services/community_opinion_service.py (+OverallOpinionSnapshot, +build_overall_prompt, +3 shape normalisers, retry helper now instance method handling IntegrityError/23505, returns canonical row, unique sentinel OVERALL_COMMUNITY_ID)
  • backend/app/api/communities.py (+/__overall__/opinion-summary route declared before the parameterised per-community route so FastAPI matches it first)
  • backend/app/api/schemas.py (+OverallOpinionResponse)
  • backend/app/api/deps.py (+get_llm_gateway, +get_community_opinion_service)
  • backend/app/engine/agent/influence.py (R8-3 formula reformulation + docstring rewrite)
  • backend/app/config.py, backend/.env.example, docker-compose.yml (model + Ollama image pin)
  • backend/tests/test_26_community_opinion.py (+26 tests: shape guards, retry, unique-violation, cross-community)
  • backend/tests/test_01_schema.py (+4 parametric MessageStrength tests for R8-3 coefficients)
  • frontend/src/types/api.ts (+CommunityOpinion, OverallOpinion types)
  • frontend/src/api/client.ts + queries.ts (+communityOpinion client + useCommunityOpinionSynthesis + useOverallOpinionSynthesis)
  • frontend/src/components/community/EliteLLMNarrativePanel.tsx + OverallOpinionPanel.tsx (new)
  • frontend/src/pages/CommunityOpinionPage.tsx + ScenarioOpinionsPage.tsx (mount panels)
  • frontend/src/__tests__/EliteLLMNarrativePanel.test.tsx + OverallOpinionPanel.test.tsx (new, 16 tests)
  • frontend/src/config/constants.ts, frontend/src/pages/SettingsPage.tsx, and three test files (llama3.2:1b alignment)

showjihyun and others added 30 commits April 7, 2026 23:27
Saving in-progress 3D graph rendering work and the layout test that
inspired it. Not ready to merge — branch parked here so PR #1 can land
on master.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the trunk-based, squash-merge workflow Prophet uses:

- master is protected, always deployable, only changes via squash-merged PR
- Short-lived feat/fix/perf/docs/refactor/chore/test branches off master
- Squash merge keeps master history linear (one commit per PR)
- --force-with-lease only, never plain --force on shared branches
- Stacked PR pattern for large features
- Conflict cascade recovery (cherry-pick onto fresh master)

Includes naming conventions, PR title/body templates, anti-patterns,
worked examples, and a quick reference card.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CONTRIBUTING.md
- Fork workflow as the primary path (gh fork --clone, upstream remote)
- 10-step PR walkthrough (claim → sync → branch → test-first → push → PR)
- Draft PR usage
- "What if CI fails on my PR?" debugging section
- "What if master moves while my PR is open?" — merge OR rebase, both fine

GIT_BRANCH_STRATEGY.md
- Audience callout: this doc is for core team; contributors → CONTRIBUTING.md
- New "Two Workflows: Direct vs Fork" section pointing fork users to the
  contributor doc
- Version bump 1.0 → 1.1

.github/
- pull_request_template.md (auto-fills on every new PR)
- ISSUE_TEMPLATE/bug_report.md, feature_request.md, question.md
- ISSUE_TEMPLATE/config.yml — disables blank issues, links to Discussions
  + Security advisories

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aligns with the upcoming default-branch rename. All shell snippets,
prose, scenario headings, and example commands now use main instead
of master.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds <HelpTooltip term="..."/> as a reusable component with a typed
glossary at src/config/glossary.ts. Every UI label that surfaces a
domain-specific term (sentiment, polarization, cascade depth, etc.)
can now display a contextual help icon that explains what the value
means and how to interpret it.

New files
- components/shared/HelpTooltip.tsx — reusable tooltip with anti-flicker
  design (wrapper-level hover, opacity toggle, pointer-events-none on
  popover, configurable left/center/right alignment, three icon sizes).
  Supports either inline label/text props OR a glossary term key.
- config/glossary.ts — typed central glossary with 30+ terms covering
  core simulation, adoption/diffusion, sentiment, emergent behaviors,
  agents/roles, personality, LLM tiers, and simulation flow.

Applied to
- SimulationReportModal — refactored to import shared component +
  use term="..." (removed local copy and the legacy HELP constant)
- StatCard (shared) — gained term + tooltipAlign props so any page
  using StatCard can opt into a tooltip with one prop
- MetricsPanel — Active Agents, Sentiment Distribution, Polarization
  Index, Cascade Depth, Cascade Width, Top Influencers
- CommunityPanel — Communities title
- TopInfluencersPage — all 4 summary stat cards (Influencers Tracked,
  Avg Influence Score, Top Community, Active Cascades)

Tests: 380/380 passing. TopInfluencers test updated to use
getAllByText for labels that legitimately appear in both the rendered
StatCard and its always-rendered (opacity-toggled) tooltip popover.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Round-2 audit found 4 lazy-loaded pages still subscribing to the full
steps array, causing chart re-renders on every WebSocket step:

- AnalyticsPage: storeSteps array → stepsLength + latestStep gate.
  All 5 chart helpers (adoption / sentiment / community / events) now
  wrapped in useMemo so recharts SVG paths don't recompute every step.
- GlobalMetricsPage: full steps subscription PLUS appendStep loop on
  history hydration (O(n) store commits → O(n) re-renders of every
  app subscriber). Replaced with setStepsBulk single commit + lazy
  getState() reads inside memos.
- AgentDetailPage: full steps subscription on a lazy page with charts.
  sentimentData and derivedInteractions memos now read lazily.
- CommunitiesDetailPage: clever inline selector that re-runs on every
  steps mutation → use canonical s.latestStep instead.

Tests: 380/380 passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ESLint cleanup (was 6 errors / 1 warning, now 0 / 0):
- AgentDetailPage: removed orphan MOCK_AGENT, MOCK_INTERACTIONS,
  MOCK_CONNECTIONS, MOCK_MESSAGES (page now strictly uses real API data)
- TopInfluencersPage: removed orphan MOCK_INFLUENCERS, DISTRIBUTION_DATA;
  fixed react-hooks/set-state-in-effect by deferring setLoading via
  queueMicrotask
- ControlPanel: removed unused eslint-disable directive

Tests for the shared HelpTooltip + glossary:
- HelpTooltip.test.tsx (15 cases): glossary lookup, hover/click toggle,
  outside-click close, alignment classes, anti-flicker invariants
  (always-rendered DOM, pointer-events-none)
- glossary.test.ts (126 cases): entry shape (label + non-empty text +
  ending punctuation), at least 25 entries, all production-required
  terms exist, type narrowing

AgentDetail tests updated for the new "no mock fallback" behavior:
- All 17 tests now use renderAndWait() so the real API mock has time
  to resolve before assertions run
- 'renders 5 personality trait bars' updated to match the actual
  trait labels derived from the real agent.personality keys

Bundle audit run:
- Initial bundle (index): 74 KB gzipped — well under target
- SimulationPage: 375 KB gzipped (three.js + force-graph)
- Cytoscape isolated to its own 137 KB chunk
- HelpTooltip + glossary cost: 4 KB gzipped (negligible)

Tests: 27 files / 521 passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two complementary optimizations cut SimulationPage gzipped size from
375 KB → 19 KB (a 95% reduction):

1. vite.config.ts: manualChunks splits heavy 3rd-party libs into named,
   stable, cacheable chunks:
   - vendor-three     (three + react-force-graph-3d + 3d-force-graph)
   - vendor-cytoscape (cytoscape, used by FactionMapView/EgoGraph)
   - vendor-recharts  (recharts + victory-vendor)
   - vendor-d3        (remaining d3-* utilities)
   chunkSizeWarningLimit bumped to 600 KB so vendor-three doesn't
   trigger a noisy warning we'd just ignore.

2. SimulationPage.tsx: GraphPanel is now React.lazy() loaded behind a
   <Suspense> boundary. The "No Active Simulation" empty state now
   renders without paying the WebGL bundle cost — three.js (341 KB
   gzipped) only loads when a real simulation is active.

Bundle table after:
  index             106 KB raw / 31 KB gzip  (was 239 / 74)  −58%
  SimulationPage     88 KB raw / 19 KB gzip  (was 1417 / 375) −95%
  vendor-three     1283 KB raw / 341 KB gzip (lazy, only on active sim)
  vendor-cytoscape  434 KB raw / 137 KB gzip (lazy, FactionMap/EgoGraph)
  vendor-recharts   461 KB raw / 134 KB gzip (lazy, analytics pages)
  vendor-d3         101 KB raw / 32 KB gzip  (cacheable shared chunk)
  GraphPanel          8 KB raw / 3 KB gzip   (lazy)

Tests (6 affected) updated to use findByTestId so React Suspense has
time to resolve before assertions:
- SimulationMain.test.tsx: 5 graph-engine tests
- SimulationPage.test.tsx: 'renders graph panel' test

All 521 tests still passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TanStack Query was installed but unused. This commit unlocks request
dedup, cross-route caching, and stale-while-revalidate for the highest-
leverage data fetches.

New module: src/api/queries.ts
- Centralized typed query/mutation hooks
- queryKeys factory for consistent invalidation
- Hooks for: projects, simulations, agents, communities, network,
  llm stats/impact

Migrated (4 components):
1. LLMDashboard
   - Was: useEffect + Promise.allSettled + local state, refetched on
     every step via stepsLength dep
   - Now: useLLMStats(simId, stepsLength) + useLLMImpact — step is in
     the cache key so a new step naturally invalidates. Two components
     calling the same hook get automatic dedup.

2. MetricsPanel
   - Was: useEffect with `Math.floor(stepNum/10)` throttle hack to
     avoid storming the agents endpoint
   - Now: useAgents — TanStack cache eliminates the storm without
     the manual throttle

3. ProjectsListPage
   - Was: useEffect on mount + setProjects + manual loading state
   - Now: useProjects() — page is instant when navigating away and back
   - useCreateProject mutation auto-invalidates the projects list

4. AgentDetailPage (3 separate fetches consolidated)
   - Was: useEffect for agent + Promise.all with network, separate
     useEffect for connections, separate useEffect for memory — three
     independent re-fetches on every navigation
   - Now: useAgent + useNetwork + useAgentMemory in parallel, all cached.
     Connection list and message list derived via useMemo from cached data.
   - Removed dead useEffect import

Test infrastructure:
- src/test/setup.ts: vi.mock('@tanstack/react-query') so every test
  gets a real QueryClient injected without per-test wrapping
- beforeEach clears the test cache so loading-state tests can actually
  observe the loading state

Bundle delta:
- index chunk: 31 → 35 KB gzipped (+4 KB for the query layer)
- All other chunks unchanged
- Worth it for the UX gains (instant back-navigation, no fetch storms)

Tests: 27 files / 521 passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 2 migrates 10 additional pages/components from raw apiClient
fetches to typed query/mutation hooks. Combined with Phase 1, the
hot fetch paths are now all going through TanStack Query with
caching, dedup, and SWR.

queries.ts additions:
- useSimulationCompare
- useCommunityTemplates / useCreateCommunityTemplate /
  useUpdateCommunityTemplate / useDeleteCommunityTemplate
- useCreateCommunity / useUpdateCommunity / useDeleteCommunity
- All community/template mutations use refetchQueries (not just
  invalidate) so consumers see fresh data immediately

Migrated pages (10):
1. AnalyticsPage — useSimulationSteps + 5 chart memos
2. GlobalMetricsPage — useSimulationSteps + setStepsBulk hydration
3. CommunityOpinionPage — useSimulationSteps + setStepsBulk hydration
4. ComparisonPage — useSimulationCompare (removed FetchState type +
   manual loading state)
5. CommunityManagePage — useCommunityTemplates + 3 mutations
   (removed local templates state, loadTemplates(), saving state)
6. CommunitiesDetailPage — useCommunities + 3 mutations
   (removed manual refetch after each create/update/delete)
7. TopInfluencersPage — useAgents (consolidated 4 separate setState
   calls into a single useMemo deriving influencers/distribution/stats)
8. GraphPanel — useNetwork (derived graphData via useMemo, removed
   setState-in-effect anti-pattern)
9. AgentInspector — useAgent (cached agent detail, edit-state synced
   via separate useEffect)

Test fixes:
- CommunityManagePage delete-reload test: wait for the Delete button
  to actually render (TanStack data + render delay) instead of
  asserting on raw mockList call count
- CombinedError display: surface templatesQuery.error in the UI
  banner alongside mutation errors

Bundle delta after Phase 1+2 vs original (74 KB initial):
- index: 31 → 35 KB gzipped (+4 KB total for query layer)
- CommunitiesDetailPage: 13.69 → 13.31 KB (-0.4 KB)
- AnalyticsPage: 13.87 → 13.72 KB (-0.15 KB)
- TopInfluencersPage: 21.05 → 20.96 KB
- SimulationPage: 88.09 → 87.82 KB

Tests: 27 files / 521 passing.
ESLint: 0 errors / 0 warnings.

Components still using raw apiClient (intentional, low value to migrate):
- ControlPanel — imperative simulation lifecycle (start/pause/step/stop)
- EngineControlPanel — single mutation
- Inject/MonteCarlo/Replay/AgentIntervene modals — one-shot dispatches
- ScenarioOpinionsPage / ConversationThreadPage — small fetch sites
- ProjectScenariosPage — project mutations
- CampaignSetupPage — project + template fetches (could migrate later)
- LoginPage — auth (mutation pattern, low cache value)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ning pages)

Final phase migrates the remaining 10 components that still used raw
apiClient. After this commit, all meaningful data fetches in the app
go through the typed query/mutation layer.

queries.ts additions:
- Community threads: useCommunityThreads, useCommunityThread
- Project scenarios: useCreateScenario, useRunScenario, useDeleteScenario
- Simulation lifecycle: useCreateSimulation, useStart/Pause/Resume/Stop/
  Step/RunAll (mostly for isPending UI gating)
- Campaign dispatches: useInjectEvent, useReplay, useMonteCarlo,
  useMonteCarloJob (supports refetchInterval polling),
  useEngineControl, useModifyAgent
- Auth: useLogin, useRegister

Migrated:
1. InjectEventModal — useInjectEvent (removed submitting state)
2. ReplayModal — useReplay (removed submitting state)
3. AgentInterveneModal — useModifyAgent
4. EngineControlPanel — useEngineControl (removed applying state)
5. MonteCarloModal — useMonteCarlo for start mutation (polling stays
   imperative for localStorage persistence)
6. LoginPage — useLogin + useRegister (removed loading state, handled
   via isPending)
7. CampaignSetupPage — useProjects + useCommunityTemplates +
   useCreateSimulation + useCreateScenario
8. ProjectScenariosPage — useProject + useRunScenario + useCreateScenario
   + useDeleteScenario + useStopSimulation (removed local project/scenarios
   state + setScenarios mutation-and-mirror pattern)
9. ScenarioOpinionsPage — useSimulationSteps + setStepsBulk hydration
10. ConversationThreadPage — useCommunityThread (removed AbortController
    dance and local apiThread/apiLoading state)
11. ControlPanel — useProjects (initial load only; keep imperative
    lifecycle for lifecycle actions — they're WebSocket-driven)
12. SimulationListPage — useSimulations (was pre-existing raw fetch
    with setState-in-effect lint warning)

Lint hygiene fixes along the way:
- InjectEventModal / ReplayModal: reset-on-open setStates wrapped in
  queueMicrotask to silence react-hooks/set-state-in-effect
- ConversationThreadPage: removed unused useEffect import

Test updates:
- CampaignSetupPage.test.tsx: wait for project option to actually
  render (findByRole('option')) before interacting with the select;
  updated /simulation navigation assertion to match the new
  /simulation/<id> parametric route

Bundle (minor movements, all reductions):
- index: 35.49 → 35.81 KB gzip (+0.3 KB for the extended query layer)
- ConversationThreadPage: 12.82 → 12.51 KB (-0.3 KB)
- CommunitiesDetailPage: unchanged 13.31 KB
- No page gained size

Tests: 27 files / 521 passing.
ESLint: 0 errors / 0 warnings.
Raw apiClient fetch sites left: ControlPanel lifecycle handlers
(intentional — imperative simulation control), SimulationReportModal
export shortcuts (one-shot file download), a few incidental calls that
don't benefit from query caching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CampaignSetupPage was a 617-line monolith violating single
responsibility. Extracted into 1 custom hook + 6 section components +
1 types file. The page is now a thin orchestrator that wires the form
state hook to the section components.

New files:
- src/hooks/useCampaignForm.ts — all form state, queries, mutations,
  handlers, and submit logic (326 lines, independently testable)
- src/components/campaign/types.ts — shared constants (CHANNELS,
  AGENT_TYPES, PERSONALITY_KEYS, COMMUNITY_COLORS) and defaultCommunity
- src/components/campaign/ProjectSelector.tsx — 52 lines
- src/components/campaign/CampaignInfoSection.tsx — 94 lines
  (name / budget / channels / message)
- src/components/campaign/TargetCommunitiesSection.tsx — 53 lines
- src/components/campaign/CampaignAttributesSection.tsx — 85 lines
  (with inner AttributeSlider sub-component eliminating 3 duplicated
  slider blocks)
- src/components/campaign/CommunityConfigurationSection.tsx — 180 lines
  (with inner CommunityCard sub-component, removes the deepest nesting
  in the original file)
- src/components/campaign/AdvancedSettingsSection.tsx — 97 lines

CampaignSetupPage.tsx: 617 → 111 lines (-82%). Every section < 200
lines, each with a clear single responsibility. All existing unit
tests still pass without modification — proof that the refactor
preserves behavior.

New E2E specs (12 tests):

1. e2e/campaign-setup.spec.ts (6 tests) — exercises the refactored
   form end-to-end:
   - renders all 6 form sections
   - submit button disabled without name
   - channel checkboxes toggle independently
   - advanced settings collapsible
   - attribute sliders update displayed value
   - project selector read-only when projectId in URL

2. e2e/help-tooltip.spec.ts (3 tests) — smoke tests for the shared
   HelpTooltip component that now surfaces ~15 glossary terms across
   the UI:
   - metrics panel has help icons for technical terms
   - hover opens tooltip without layout jitter (anti-flicker check)
   - accessible labels match glossary terms

3. e2e/tanstack-cache.spec.ts (3 tests) — verifies the TanStack Query
   migration's main UX promise: cross-route caching. Each test
   intercepts network requests, navigates away and back, and asserts
   that at most 1 revalidation fetch occurs:
   - projects list cached across navigation
   - community list cached per simulation
   - agent list reused across panels

E2E total: 74 → 86 tests.

Unit tests: 27 files / 521 passing.
TypeScript: 0 errors.
ESLint: 0 errors / 0 warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…unting

- Monte Carlo runner executes runs concurrently via asyncio.Semaphore with
  configurable max_concurrency (default 3) and real per-community adoption
- Personality drift system: agents evolve personality based on actions with
  cumulative drift tracking and MAX_DRIFT cap
- Campaign controversy parameter wired through agent tick pipeline
- Real intra/inter-community edge counting for cascade detection
  (replaces hardcoded stubs)
- Community link counting offloaded to thread pool (non-blocking)
- O(n²) community metrics reduced to O(n) via pre-bucketing
- Monte Carlo memory leak fixed: orchestrator state cleaned up after each run
- Monte Carlo return type annotation corrected
- LLM fallback stub tracking via is_fallback_stub flag
- Startup migration marks orphaned running/paused sims as failed
- FK safety in persistence: explicit flush ordering for simulation row
- Replace broad except HTTPException catches with status_code != 404
  filter across 7 endpoints (agents, network, simulations) so real 500s
  surface instead of returning silent empty data
- Historical sims (DB-only after restart) return empty data instead of 404
- run_scenario only starts simulation after DB row is confirmed; aborts
  and cleans up on persistence failure (prevents ghost simulations)
- Replay endpoint now properly returns 500 on failure instead of fake
  replay_id
- Export endpoint falls back to DB for historical simulations
- Replace 2D Cytoscape canvas with WebGL/three.js 3D renderer
- Community-colored nodes and edges with instanced sphere rendering
- Auto-scaled resolution for large graphs (2k+ nodes)
- Physics settle with cooldownTicks + d3AlphaDecay
- Orbit/zoom/pan controls
- EgoGraph filter improvements
- Migrate all pages to TanStack Query hooks from queries.ts
- Central glossary system with HelpTooltip for technical terms
- ControlPanel refactored into focused sub-components
- Page-level improvements across 6 detail/opinion pages
- Updated tests for new query patterns
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- data/community_templates.json: seed data for community templates
- .gitignore: exclude gstack-reports, playwright-mcp logs, pencil files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove all Monte Carlo code, API endpoints, DB model, tests, frontend
components, query hooks, and constants. The feature was adding complexity
without being a core differentiator.

Deleted: monte_carlo.py, test_06_api_monte_carlo.py, MonteCarloModal.tsx
Removed from: simulations.py (3 endpoints), schemas.py, propagation.py,
  diffusion/schema.py, config.py, client.ts, queries.ts, constants.ts,
  ControlPanel.tsx, AnalyticsPage.tsx, glossary.ts + 10 test files cleaned

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- StepRunner now uses the shared gateway instance (was silently creating
  a separate one, so stats never accumulated)
- LLM Gateway tracks total_tokens per call and maintains a 100-entry
  ring buffer of call metadata (provider, latency_ms, tokens, cached)
- Orchestrator.get_llm_stats() returns real cached_calls and total_tokens
- Orchestrator.get_llm_calls() serves from the gateway ring buffer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- AgentDetailPage: connections = edge degree, subscribers = incoming edges
  (computed from the same network query used by EgoGraph)
- TopInfluencersPage: connections and chains derived from network degree
  map (was hardcoded 0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Register and login now use the existing users table via SQLAlchemy.
Users survive server restarts. JWT logic unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLAUDE.md: remove Celery reference
- README.md: remove Monte Carlo from feature list and workflow
- CHANGELOG: full v0.1.0.0 entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Test badge: 344 → 520 frontend tests
- Remove Monte Carlo from Roadmap shipped list
- 3D graph (react-force-graph-3d) as primary visualization throughout
- Cytoscape.js now listed as EgoGraph-only, not main renderer
- Quick Start flow updated to match current UI (Projects → Scenario)
- Tech stack table: testing counts, frontend stack corrected
- Roadmap shipped list: add auth DB, LLM tracking, agent connections
- Acknowledgments: three.js/react-force-graph-3d replaces Cytoscape
  as main graph credit; Cytoscape credited for EgoGraph
- Remove Twitter placeholder (not active)
- Add Git Branch Strategy to docs section
- Remove emoji prefixes from use-case headers for cleaner look

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1: Exposure Fatigue, Edge Weight Perception, Expert Opinion Score, Prompt Injection Defense
Phase 2: Emotional Contagion, Bounded Confidence (Deffuant), Content Generation prompt
Phase 3: Reflection Engine (Simulacra-style), Homophily edge weighting

55 new tests covering all features.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Unify PropagationEvent: add action_type/generated_content to influence.PropagationEvent
- Remove getattr duck-typing hacks in BridgePropagator and StepRunner
- Clean types.py: pure re-export module (no duplicate class definitions)
- Wire ReflectionEngine into tick.py (both sync and async paths)
- Remove dead run_until_complete hack in sync tick()
- Add _fire_and_forget helper for async task error logging (14 call sites)
- Fix bare except in persistence.py agent serialization
- Add error logging to _config_to_dict and _community_metric_to_dict
- Wrap run_all step_callback with error isolation
- Add network validation failure logging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wire target_communities in inject event API + orchestrator
- Add bad_review to allowed event types
- Add frontend cache invalidation on inject success
- Add InjectEventModal tests (14 tests)
- Add EngineControlPanel tests
- Fix propagation animation utils extraction
- Misc linter/formatter fixes across backend and frontend

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- _pick_adapter() now routes by tier: Tier 1/2 → Ollama, Tier 3 → Claude/OpenAI/Gemini
- get_default() uses settings.default_llm_provider instead of os.environ
- run_all endpoint: catch SimulationCapacityError (429), InvalidState (409), generic (500)
- Import all simulation exceptions in API layer
- 8 new tier routing tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend job (Python 3.12): uv sync + pytest (961 tests, no services needed
since all tests use mock adapters).

Frontend job (Node 20): tsc --noEmit + eslint + vitest (562 tests). Build
step omitted because tsc -b surfaces 4 pre-existing errors that need a
separate fix (TopInfluencersPage Recharts types, simulationStore path
alias, vite.config.ts vitest field).

Both jobs run on push to main and all PRs. Concurrency group cancels
stale runs on the same ref. uv and npm caches enabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
showjihyun and others added 21 commits April 11, 2026 00:03
…eMemo

Exposed by new CI workflow. Two ESLint errors were live on feat/graph-3d:

1. neighborIdArr assigned but never used (inter-edge loop iterates
   neighborIds Set directly, so the Array.from copy was dead)
2. setLoading/setEmpty called synchronously inside useEffect guard
   (react-hooks/set-state-in-effect) — refactored to derive empty state
   via useMemo from the TanStack Query cache instead of manual effect

Type check, pytest, and vitest already pass locally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The @/* path alias was defined in vite.config.ts resolve.alias but not
in TypeScript's compiler config, so tsc -b produced 85 TS2307 errors
across test files and any source file using the alias. Added baseUrl
and paths so TypeScript and Vite agree on module resolution.

Knocks tsc -b errors from 130 to 45 (remaining are real code issues,
not config).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two related doc updates that accumulated in this session:

1. SPEC index: 구 19/20/21 SIMULATION_QUALITY SPECs merged into the
   consolidated 21_SIMULATION_QUALITY_SPEC.md. Updated the Active SPEC
   table + added a consolidation history note.

2. Health Stack typecheck: changed from 'tsc --noEmit' to 'tsc -b'.
   Root tsconfig.json is "files": [] + references-only, so tsc --noEmit
   (without -b) compiles nothing and returns 0 errors — a silent no-op.
   That's why 130 type errors accumulated unnoticed. Added a warning
   note explaining this so the next contributor doesn't repeat it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…NESS)

Four root-level docs were mixed Korean+English and have been translated to
English end-to-end so international contributors can read them:

- AGENTS.md   (10.8% Korean → 0%)  multi-agent working guide
- CLAUDE.md   (14.8% Korean → 0%)  project instructions + SPEC-GATE rules
- DESIGN.md   (5.8% Korean  → 0%)  UI design system + Pencil frame mapping
- HARNESS.md  (20.6% Korean → 0%)  six context-strategy principles

All semantic content preserved exactly:
- SPEC paths, anchor IDs, file names, code blocks
- Enforcement markers (⛔, ✅, ❌) and their meanings
- CLAUDE.md Phase table (test counts refreshed to 961 backend / 521 frontend)
- CLAUDE.md Hard Rules with the same legal weight
- Health Stack tsc -b warning note (added earlier this session)

Other root MD files were already English: CHANGELOG, CODE_OF_CONDUCT,
CONTRIBUTING, ROADMAP, SECURITY. README.md has pending unrelated changes
from the GitHub star conversion rewrite and will be committed separately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Batch 1 of Step B (tsc -b error reduction). Drops 22 errors: 14 unused
imports/vars + 5 missing name fields + 2 cytoscape mock cast issues + 1
unmountComponentAtNode reference.

- EngineControlPanel/InjectEventModal/SimulationMain/SimulationPage/
  GlobalMetrics: add name to MOCK_SIMULATION (now matches SimulationRun)
- FactionMapView: drop unmountComponentAtNode (deprecated in @testing-library
  18), drop unused React/afterEach, add `unknown` to cytoscape→Mock cast
- UIFlowSpec/PropagationAnimation/EngineControlPanel/InjectEventModal:
  drop unused React, vi, act, afterEach, Routes, Route imports
- glossary.test.ts: rename unused destructured `key` → `_key` in 3 it.each
  blocks

tsc -b errors: 45 → 23. All remaining are in source code (next batch).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Batch 2 of Step B. Drops 5 tsc -b errors:

- constants.ts: re-export SimulationStatus (was locally imported only,
  consumers like SimulationListPage couldn't reach it through constants)
- api/client.ts: add local `import type { MemoryRecord }` — the existing
  `export type { MemoryRecord }` re-export doesn't bring the symbol into
  local scope when verbatimModuleSyntax is on
- CommunitiesDetailPage: define an explicit LocalCommunity interface and
  replace the stale `typeof COMMUNITIES[number]` return annotation (the
  local COMMUNITIES array was removed in an earlier refactor but the
  annotation lingered). Properly typing influencers and emotions also
  fixes the `inf: any` and `unknown → ReactNode` errors downstream

tsc -b errors: 23 → 18.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Batch 3 of Step B. Drops 6 tsc -b errors:

- EngineControlPanel: add `unknown` hop to the mutateAsync result cast
  (Record<string, unknown> → EngineControlResponse doesn't overlap directly)
- useProjectScenarioSync: same `unknown` hop for the SimulationRun → Record
  inspection pattern
- CommunityPanel: drop the `metrics.size` branch (field never existed on
  CommunityStepMetrics — speculative API shape that was never materialized);
  fall through to the adoption_rate derivation
- EgoGraph: `cytoscape.Stylesheet[]` → `cytoscape.StylesheetStyle[]`
  (Stylesheet was removed from the type union; StylesheetStyle is the
  variant that carries a `style:` block)
- GraphPanel: drop the `"ResizeObserver" in window` fallback. Modern
  lib.dom declares ResizeObserver as always present on Window, so the
  `window.addEventListener("resize", ...)` branch is unreachable and
  tsc narrowed the `window` symbol to `never` inside it.

tsc -b errors: 18 → 12.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Batch 4 of Step B. Drops 7 tsc -b errors:

Recharts 3 tightened its Formatter signature (ValueType widened beyond
number, 1-element tuple returns no longer accepted). Fix each call site
by dropping the explicit `v: number` parameter annotation (contextual
typing handles it) and either:
- returning a plain string for ReactNode formatters (AnalyticsPage:288)
- keeping the [value, label] 2-tuple, using String()/Number() at use sites
  (GlobalMetricsPage polarization + sentiment tooltips)
- casting the payload param at its access site for TopInfluencersPage's
  custom tooltip + Bar onClick handlers

Also: SimulationReportModal's useMutation wrapped a void-returning
`apiClient.simulations.export()` (which only opens a window). Wrap it in
an async fn so the mutation function returns Promise<void> as TanStack
Query expects.

tsc -b errors: 12 → 5.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…olyfill

Batch 5 (final) of Step B — reaches tsc -b 0 errors.

- GlobalMetricsPage: `latestStep > 0` was comparing a StepResult object to
  a number. Read `latestStep.step` instead so the throttle actually
  checks the step counter
- vite.config.ts: drop the `defineConfig` wrapper and export a plain
  object. `defineConfig` from 'vite' rejects the vitest `test` field, and
  the alternative `defineConfig` from 'vitest/config' pulls in a nested
  copy of vite that collides with the real project's vite when typing
  plugins like react() / tailwindcss(). A plain object works identically
  at runtime for both tools and unblocks tsc -b. Added an explicit
  `manualChunks(id: string)` annotation since we lose contextual typing
- test/setup.ts: polyfill ResizeObserver for jsdom so GraphPanel's
  ResizeObserver-based sizing works in tests (needed after the previous
  batch dropped the legacy `"ResizeObserver" in window` fallback)

Final state:
- tsc -b: 0 errors (was 130 at start of Step B)
- eslint: 0 errors, 0 warnings
- vitest: 562/562 passing
- npm run build: succeeds, produces production bundle

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two related changes that complete the type-safety feedback loop:

1. Replace `tsc --noEmit` with `tsc -b`. The old command was a silent
   no-op because the root tsconfig.json is references-only — see the
   Health Stack warning note added to CLAUDE.md earlier in this PR.
2. Add a Build step that runs `npm run build`. This catches bundler
   regressions (missing imports, chunk config issues, asset paths) that
   tsc alone wouldn't surface, and also acts as a second gate on
   tsc -b since `npm run build` is `tsc -b && vite build`.

Prerequisite satisfied: the 130 tsc -b errors that had accumulated
before this feedback loop existed were all fixed in the five fix(types/
tests) commits that land with this one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file was created as part of the 20_CLEAN_ARCHITECTURE_SPEC #4.1
refactor (extracting inline types from api/client.ts) but it was never
committed — only existed locally. CI caught this after Batch 2 of the
Step B fixes added a local `import type { MemoryRecord } from
'../types/api'` that turned the missing file into a hard failure.

Contents: 225 lines of request/response interfaces for Simulation /
Agent / Community / Thread / Settings / LLM endpoints. Zero imports
from other files (pure type definitions), so landing this in isolation
is safe.

Unblocks the tsc -b stage of CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…per row

Closes a UX gap where /simulation was a flat list with no project context
and "New Simulation" dropped users into /setup with no project
pre-selected, forcing them to discover the requirement mid-form.

## Changes

- SimulationListPage header gains a project filter <select> (default
  "All projects"). Selecting a project filters the list client-side and
  changes the "New Simulation" button route from /setup to /setup/:pid.
- Each row renders the owning project inline below the sim name as
  `{simulation_id} · {project_name}`. Orphan sims (unknown project_id)
  render the id alone — no middle-dot, no deleted-project leak.
- Filtered empty state gets its own copy ("No simulations in this
  project") and CTA ("Create in this project" → /setup/:pid).
- SimulationRun type gains optional `project_id?: string | null` so the
  TypeScript compiler can see the field that was already on API responses.
- 11 new tests in SimulationListPage.test.tsx cover SL-AC-01 through
  SL-AC-09 (default filter state, filter application, navigation routing
  both branches, per-row project name, orphan fallback, filtered empty
  state + CTA, projects query loading, projects query error fallback).

## SPEC

New section 18_FRONTEND_PERFORMANCE_SPEC.md §10 defines SL-01 through
SL-05 contracts plus SL-AC-01~09 acceptance criteria. The SPEC file
itself is .gitignore'd per the project's IP protection rule, so this
commit only carries the code that implements it.

## Non-goals

- No server-side filtering: apiClient.simulations.list() stays parameter-
  less. Projects stay in the low-double-digits, sims in the low hundreds —
  the client filter is faster than an extra round trip.
- No persistent filter state: no URL query param, no localStorage.
  Ephemeral state keeps returning users from hitting stale filters.
- No changes to /projects or /projects/:id/scenarios pages.

## Verification

- npx tsc -b: 0 errors
- npx eslint src/pages/SimulationListPage.tsx: 0 errors, 0 warnings
- npx vitest run src/__tests__/SimulationListPage.test.tsx: 11/11 green

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the 201-line Apache-2.0 text with the standard MIT License,
prefaced by a one-line @showjihyun tagline. README, shields, and
pyproject/package manifests already reference MIT, so this resolves
the prior LICENSE-vs-README mismatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
README/CLAUDE/CONTRIBUTING test-count claims updated to the real
numbers: 1,002 backend (was 961/981/1,234 in various places) and
656 frontend (was 521/609).

CHANGELOG [Unreleased] gains the session's graph animation fix
(UUID↔node_id translation), low-centrality propagation restore
(influence floor + sigmoid smoothing deduped into
propagation_calibration.py), startup deadlock hardening, dynamic
community palette, and the LICENSE Apache-2.0→MIT switch. The
existing [0.1.1.0] entry is untouched.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This is a large consolidation commit (106 files, +12,054/-900) that bundles
two logical layers of work that accumulated on the feat/graph-3d branch:

## Prior-session work (clean-architecture + feature-dev)

- Repositories layer: backend/app/repositories/ — simulation_repo, project_repo,
  memory_repo, protocols, simulation_persistence split out of engine code so
  the orchestrator no longer owns its own DB sessions
- Services layer: backend/app/services/ — simulation_service, community_opinion_service,
  notification_service, ports — clean separation between HTTP handlers and
  orchestrator/engine, with session lifecycle managed at the service boundary
- Conversation threads: thread_capture pipeline, ThreadMessageRow ORM,
  22_CONVERSATION_THREAD_SPEC tests (test_22_conversation_threads.py)
- Expert LLM engine: richer expert evaluation with SLM fallback
  (23_EXPERT_LLM_SPEC tests in test_23_expert_llm.py)
- Community opinion feature: community_opinion model, service, API,
  CommunityOpinionPanel and pages, migration e1_community_opinion
- LLM cache observability: test_24_cache_observability covering the vcache hit
  path + tier distribution
- Frontend component split-outs: DecidePanel, EmergentEventsPanel,
  EliteLLMNarrativePanel, OverallOpinionPanel, FormProgressBanner,
  SimilarityWarningBanner, WorkflowStepper, GraphLegend, ZoomTierBadge —
  all extracted from their parent containers with matching test files
- ArchitectureInvariants.test.ts + communitySimilarity.test.ts structural guards
- test_21_simulation_quality split into p1/p2 files + test_21_memory_pgvector
- test_25_simulation_service + test_26_community_opinion service-layer tests
- Bigint random_seed migration (d1_bigint_random_seed) — seed column widened
  so values outside int32 range don't overflow

## This session's fixes (documented in CHANGELOG Unreleased)

- **Propagation animation restored** — GraphPanel's active-link keys were
  built from agent UUIDs while linkDirectionalParticles looked them up by
  graph node_ids, so particles never drew. Translation now lives in
  propagationAnimationUtils.ts (buildAgentIdToNodeId + buildActivePropLinks)
  and is exercised by the same regression tests the component uses
- **Low-centrality agents propagate again** — the agent tick path used
  InfluenceLayer which missed the Round 7-d floor + sigmoid emotion
  smoothing that propagation_model.py already had. Both paths now call
  the shared propagation_calibration.propagation_probability() so future
  calibration tweaks live in one file
- **Startup deadlock on /api/v1/projects/** — lifespan split into two
  short-lived transactions with SET LOCAL lock_timeout = '10s', and
  metadata.create_all is skipped entirely when alembic_version is present
- **community_name in agent responses** — AgentDetailResponse gains a
  community_name field, resolved via a cached community_uuid → cc.name
  map (was O(N) graph walk per inspector click, now O(1))
- **Dynamic community palette in 3D graph** — palette is derived from
  the live graph instead of the hardcoded A/B/C/D/E default, so real sims
  with "mainstream"/"skeptics"/etc get real colors. Fallback color is
  hashed from the community id for stability across re-fetches
- **Graph node labels** use the graph node_id (Agent #42) instead of the
  first-8-chars of the agent UUID, which were identical for every
  deterministic-seed agent
- **Graph overlay layout** — 3D Controls hint moved to bottom-right,
  community legend raised 200px, GraphLegend gained a bottomOffsetPx prop
  so the stacking stays coherent
- **Regression tests** — test_01_influence gains
  test_round_7d_low_influence_agents_still_propagate and
  test_round_7d_negative_emotion_factor_still_propagates; PropagationAnimation
  test suite gains 5 tests that exercise the real utility functions

## Verification at commit time

- backend: test_01_influence (10/10), test_07_propagation_pairs (29/29),
  test_04_community_orchestrator, and the agent/influence/propagation/project
  filter (205/205) all green earlier in the session
- frontend: 656/656 pass (40 files) — fresh run at commit time
- tsc -b: clean on all touched files (pre-existing baseline errors in
  communitySimilarity.test.ts and DecidePanel.tsx are not mine)
- Live end-to-end: fresh sim produces non-empty propagation_pairs with
  agent UUIDs that resolve to valid node_ids in the network graph;
  /api/v1/simulations/{id}/agents/{id} returns community_name: "Alpha"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rtial

CI's tsc -b step caught two errors I wrongly dismissed during the /ship
self-review as "pre-existing baseline". They were pre-existing in the
working tree, not in the tracked tree — my consolidation commit (3e3ba11)
staged the test file that surfaced them, so the failing tsc only appeared
after the push. CI run 24282435136 was the signal.

Root cause: `CommunityConfigInput` declared `personality_profile` as
required with all five traits, but:

  - The backend fills missing traits with 0.5 at agent generation
    (`orchestrator._trait()` in `app/engine/simulation/orchestrator.py`).
  - `src/api/client.ts:130,132` already documents the field as optional
    with a partial `Record<string, number>` shape.
  - `communitySimilarity.personalityVector()` defensively uses
    `c.personality_profile ?? {}` and falls back to 0.5 per trait.
  - The failing test `falls back to default 0.5 when personality_profile
    is missing` intentionally exercises the missing-profile path.

The type was lying about the runtime contract. `?: Partial<...>` matches
reality and surfaced two real latent bugs in
`CommunityConfigurationSection.tsx` where the component read
`community.personality_profile[key].toFixed(2)` without any null guard.
Those would have thrown at runtime on any community that omitted traits.

Verification:
- `npx tsc -b` → 0 errors
- `npx vitest run` → 656/656 pass (40 files)
- `npx eslint` (touched files) → clean

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend CI run 24282569392 failed with 13 test failures + 58 errors,
all of the form:

  OSError: [Errno 111] Connect call failed ('127.0.0.1', 5432)

Not a code regression — the workflow had no Postgres service. The
backend job ran `uv run pytest tests/` on a clean Ubuntu runner that
has nothing listening on 5432, so every test that ends up invoking
`TestClient` (which triggers `app.main.lifespan` → real DB session)
died at connection time. Locally I had `prophet-db-1` Docker container
serving on 5432, so tests passed; CI didn't.

Fix:

1. `services.db` — `pgvector/pgvector:pg16` image, not vanilla
   `postgres:16`, because the lifespan runs
   `CREATE EXTENSION IF NOT EXISTS vector` on startup. A vanilla
   image would fail with "could not open extension control file
   'vector.control'" and leave the app in a half-initialised state.
2. Health check with `pg_isready` so pytest waits for the service
   container to be ready to accept connections. GitHub Actions
   holds job execution until health checks pass.
3. `DATABASE_URL: postgresql+asyncpg://prophet:secret@localhost:5432/prophet`
   at the job level — every step inherits it. Port 5432 matches the
   service container mapping (local dev compose uses 5433 on the
   host to dodge developer-machine Postgres conflicts).

Valkey and Ollama are not added — none of the failing tests hit
those services. LLM tests use the SLM stub path, and LLM cache tests
either mock Valkey or gracefully degrade when it's unavailable. If
a future test regression requires either, add them the same way.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend already ships llama3.2:1b in `backend/app/config.py:31`,
`.env.example`, and `docker-compose.yml` (Round 8-5). The frontend
constants, SettingsPage defaults, README/CONTRIBUTING setup
instructions, and two test fixtures still referenced the old
`gemma4:latest` default — a stale Round 7 choice that the backend
already reverted because 0.20.x Ollama has a CPU-inference regression
on the gemma runner.

This commit closes the frontend-side gap so every surface tells the
same story:

- `frontend/src/config/constants.ts:117` DEFAULT_OLLAMA_MODEL
- `frontend/src/pages/SettingsPage.tsx:41-43` useState defaults for
  ollamaDefaultModel, slmModel, ollamaEmbedModel
- `frontend/src/__tests__/SettingsPage.test.tsx:24-26` mock response
- `frontend/src/__tests__/UIFlowSpec.test.tsx:303` FLOW-29 assertion
- `frontend/src/__tests__/EliteLLMNarrativePanel.test.tsx:68` mock
- `frontend/src/__tests__/OverallOpinionPanel.test.tsx:49,91` mocks

Doc sync so users don't pull the wrong model:

- `README.md` — Quick Start ("Pull LLM model") block: gemma4:latest
  (~9.6 GB) → llama3.2:1b (~1.3 GB), plus the matching acknowledgment
  in the Ollama credits section
- `CONTRIBUTING.md` — "Run it" bootstrap step

The historical comment in `backend/app/config.py:21` that mentions
Round 7 briefly switching to gemma4:latest is intentionally preserved
as documentation of the decision history.

Verification:
- npx vitest run on the 4 touched test files → 56/56 pass
- npx tsc -b → clean
- grep "gemma4" across md/yml/ts/tsx/py/toml → only the intentional
  historical comment remains

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ling

Two related robustness fixes for `CommunityOpinionService`, both
discovered while watching real small-LLM runs produce malformed output
or two synthesis requests hit the DB at the exact same step.

## LLM response normalisation (the small-LLM hostile-output path)

Small models (1-3B params) frequently:
  1. Echo the schema literal instead of picking a value — e.g. return
     `"rising|stable|polarising|collapsing"` verbatim in
     `sentiment_trend`, which blows past the `VARCHAR(32)` column.
  2. Return a single string or bare object where the schema says
     "list of objects", crashing the frontend `.map()` renderers.
  3. Mix garbage into `dominant_emotions` — integers, nulls, or empty
     strings interleaved with real emotion words.

The existing `_parse_response` only guarded `summary` and
`sentiment_trend`. The new `_normalise_themes`, `_normalise_divisions`,
and `_normalise_key_quotes` helpers each:

  - Drop the whole element if the shape is wrong (not a dict, missing
    the required key, wrong type) rather than coercing garbage in.
  - Clamp numeric fields (`weight`, `share`) to [0, 1].
  - `_clip_str` every string field to a column-safe max length.
  - Default missing optional fields to 0 or `[]`.

The rationale: better to lose one bad theme than persist garbage that
then crashes the renderer downstream.

## Unique-violation race (sqlstate 23505)

`_persist_row_with_retry` already handled PostgreSQL deadlocks
(`sqlstate 40P01`) but not the other race it can hit: two concurrent
synthesis requests both miss the `_find_cached` lookup, both build a
row for `(sim, community, step)`, and the second one trips the
`uq_community_opinions_sim_comm_step` unique constraint.

Retrying a doomed insert doesn't help — the constraint will reject the
second attempt too. Fix: on 23505, roll back, re-query
`_find_cached`, and return the winner's row. The API caller still
gets a canonical `CommunityOpinionSnapshot`, just built from the other
writer's data.

Also: on non-deadlock `DBAPIError`, explicitly `await
session.rollback()` before re-raising. The previous path left the
session dirty for the caller, which surfaced as cascading "session
already closed" errors downstream.

Return type change: `_persist_row_with_retry` now returns the
`CommunityOpinion` row (either the one inserted or the race winner)
instead of `None`. Both call sites updated to use the returned row
when constructing the snapshot — otherwise a race win would return a
snapshot built from the aborted row.

## Test coverage

`test_26_community_opinion.py` gains a `TestResponseNormalisation`
class (+170 lines) covering:

  - `_normalise_sentiment_trend` — happy cases, American spelling,
    the classic schema-literal echo, unknown values, None, integers
  - `_clip_str` — clipping, empty, None, non-string coercion
  - `_parse_response` — end-to-end with a hostile small-LLM payload
    that mixes all the failure modes

Locally: `uv run pytest tests/test_26_community_opinion.py -q` →
**42 passed in 21s** (against prophet-db-1 Docker Postgres).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aint

Completes the race-handling fix in `ae19716`. Without a DB-level
unique constraint, two concurrent synthesis requests that both miss
`_find_cached` will both INSERT successfully and both pay for a real
Tier-3 LLM call — silently doubling cost and producing conflicting
synthesis rows for the same `(sim, community, step)`.

The service-layer handler added in `ae19716` catches sqlstate 23505
and re-fetches the winner's row, but that handler never fires without
a constraint that actually rejects the second INSERT. These two
commits are load-bearing for each other.

## Changes

- `backend/app/models/community_opinion.py` — declare
  `UniqueConstraint("simulation_id", "community_id", "step",
  name="uq_community_opinions_sim_comm_step")` in `__table_args__`.
  Keeps SQLAlchemy's reflection/introspection in sync with the real
  DB schema so `Base.metadata` matches Alembic.

- `backend/migrations/versions/e2_community_opinion_unique.py` —
  new Alembic migration:

  1. `DELETE FROM community_opinions a USING community_opinions b
     WHERE a.created_at < b.created_at AND ...` — defensive cleanup
     of any pre-existing duplicates. Sequential code couldn't produce
     them, but a database that happened to catch a race pre-fix
     might have some lying around.
  2. `op.create_unique_constraint("uq_community_opinions_sim_comm_step",
     "community_opinions", ["simulation_id", "community_id", "step"])`.
  3. `down_revision: e1_community_opinion` so the migration chain
     stays linear.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… tables

CI run 24282780787 got past the Postgres-connection fix (`c19cc21`)
and hit the next wall: every API test errored with

    asyncpg.exceptions.UndefinedTableError:
        relation "simulations" does not exist

Not a code regression, and not a schema drift — the app's schema
bootstrap never runs at all under the test transport.

## Root cause

API tests use

    transport = ASGITransport(app=app)
    async with AsyncClient(transport=transport, base_url="http://test") as c:
        ...

That drives the ASGI app directly, bypassing FastAPI's lifespan. So
`app.main.lifespan` — which normally does
`CREATE EXTENSION vector` + `metadata.create_all` + the stale-sim
cleanup — never fires during the test session. On a dev laptop this
goes unnoticed because Docker Postgres already has the schema from a
previous live run or from `alembic upgrade head`. On a fresh CI
Postgres container, nothing has ever created the tables, and the
first insert dies on `UndefinedTableError`.

The existing `_clean_simulation_db` autouse fixture tries to
`TRUNCATE TABLE simulations CASCADE` but silently swallows the
`UndefinedTableError` because the fixture is also running on tests
that don't touch a DB at all. That broad `except Exception: pass`
meant the truncate was a no-op on the broken schema too, so nothing
surfaced the missing tables until the actual query attempted the
same table later in the test body.

## Fix

New `_bootstrap_schema` fixture in `conftest.py`:

  - `scope="session"` — runs once for the whole pytest session.
  - `autouse=True` — every test gets it, whether or not it touches
    the DB. Pure-unit tests just pay one `CREATE EXTENSION` round
    trip (negligible vs total suite cost).
  - Imports `app.models` so every ORM class is registered on
    `Base.metadata` before `create_all` walks the table list.
  - Runs the exact same DDL the lifespan would have run:
    `CREATE EXTENSION IF NOT EXISTS vector` + `uuid-ossp` +
    `Base.metadata.create_all`.
  - Swallows exceptions so tests without a reachable DB (pure
    harness unit tests, CI-less laptop runs) aren't blocked.

## Verification

Locally (prophet-db-1 Docker Postgres), the previously-failing suites:

    $ uv run pytest tests/test_06_api_acceptance.py \
                    tests/test_06_api_simulations.py \
                    tests/test_06_api_agents.py \
                    tests/test_06_api_communities.py \
                    tests/test_06_api_ws.py \
                    tests/test_network_graph.py -q

    ............................................................  [ 69%]
    ..........                                                     [ 100%]
    12 + 85 = 97 passed in ~220s

Those were 71 failures/errors in the previous CI run. With this
fixture the schema is present before any test runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@showjihyun showjihyun merged commit 1669ccb into main Apr 11, 2026
2 checks passed
@showjihyun showjihyun deleted the feat/graph-3d branch April 11, 2026 13:40
showjihyun added a commit that referenced this pull request Apr 11, 2026
…is severed

Ran 6 pilots (UC1/UC2/UC3 baseline+reframed) against the post-R8-3 engine
via a new reusable harness at backend/scripts/run_use_case_pilot.py. All
3 README use cases failed to reproduce their quantitative claims:

  | Case                    | README claim       | Actual   |
  |-------------------------|--------------------|----------|
  | uc1_baseline            | stall at 12%       | 97.3%    |
  | uc1_reframed            | 31%                | 97.4%    |
  | uc2_strategy_b          | echo chamber       | cascade  |
  | uc2_strategy_c          | viral cascade      | cascade  |
  | uc3_rto_raw             | -38% eng sentiment | +0.70    |
  | uc3_rto_restructured    | -60% opposition    | -0.3 pts |

Every pilot produced an identical step-by-step trajectory within a given
population size — controversy swung 0.80 to 0.15, utility 0.20 to 0.85,
and the final adoption rate moved by 0.002. That's the smoking gun: the
campaign framing inputs have zero effect on the simulation.

Root cause: CampaignConfig.{novelty,utility,controversy} are read into
CampaignEvent in step_runner.py and then dropped at the
_build_environment_events() boundary. The agent tick loop builds
MessageStrength from agent-derived values (media_signal,
cognition.evaluation_score) and a campaign_controversy method
parameter that defaults to 0.0 and is never set by any caller. The
entire R8-3 formula reformulation was mathematically correct but
operating on values that never come from the actual user inputs.

What this commit adds:

  * backend/scripts/run_use_case_pilot.py — reusable pilot runner with
    6 named cases, deterministic seeds, httpx-based API driver, and
    JSON-output to docs/pilot_results/{case}.json
  * docs/USE_CASE_PILOTS.md — full side-by-side of README claims vs
    actual engine output, root cause writeup pointing at the exact
    lines in step_runner.py + tick.py, and 5 proposed follow-up items
    (wire fix, regression tests, re-calibration, LLM hardening, README
    disclaimer)
  * docs/pilot_results/*.json — raw per-case artifacts so the analysis
    can be re-verified from the source data

The opinion synthesis plumbing from PR #2 held up perfectly — all 6
pilots got non-stub llama3.2:1b responses through the unique-constraint
+ shape-guarded persistence path. The small LLM hallucinated narratives
that matched the README (e.g. "rapid cascade in early_adopters stalls
against skeptic resistance") while the actual metrics showed every
community at 86-100% adoption. That's a separate hardening follow-up.

Next P1 task is the wire fix. Estimated: ~30 min CC, then a fresh pilot
round to verify. Regression tests in test_04_simulation_acceptance.py
will pin the outcome so this can't silently regress again.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
showjihyun added a commit that referenced this pull request Apr 11, 2026
* docs(pilots): verify README use cases end-to-end, find campaign wire is severed

Ran 6 pilots (UC1/UC2/UC3 baseline+reframed) against the post-R8-3 engine
via a new reusable harness at backend/scripts/run_use_case_pilot.py. All
3 README use cases failed to reproduce their quantitative claims:

  | Case                    | README claim       | Actual   |
  |-------------------------|--------------------|----------|
  | uc1_baseline            | stall at 12%       | 97.3%    |
  | uc1_reframed            | 31%                | 97.4%    |
  | uc2_strategy_b          | echo chamber       | cascade  |
  | uc2_strategy_c          | viral cascade      | cascade  |
  | uc3_rto_raw             | -38% eng sentiment | +0.70    |
  | uc3_rto_restructured    | -60% opposition    | -0.3 pts |

Every pilot produced an identical step-by-step trajectory within a given
population size — controversy swung 0.80 to 0.15, utility 0.20 to 0.85,
and the final adoption rate moved by 0.002. That's the smoking gun: the
campaign framing inputs have zero effect on the simulation.

Root cause: CampaignConfig.{novelty,utility,controversy} are read into
CampaignEvent in step_runner.py and then dropped at the
_build_environment_events() boundary. The agent tick loop builds
MessageStrength from agent-derived values (media_signal,
cognition.evaluation_score) and a campaign_controversy method
parameter that defaults to 0.0 and is never set by any caller. The
entire R8-3 formula reformulation was mathematically correct but
operating on values that never come from the actual user inputs.

What this commit adds:

  * backend/scripts/run_use_case_pilot.py — reusable pilot runner with
    6 named cases, deterministic seeds, httpx-based API driver, and
    JSON-output to docs/pilot_results/{case}.json
  * docs/USE_CASE_PILOTS.md — full side-by-side of README claims vs
    actual engine output, root cause writeup pointing at the exact
    lines in step_runner.py + tick.py, and 5 proposed follow-up items
    (wire fix, regression tests, re-calibration, LLM hardening, README
    disclaimer)
  * docs/pilot_results/*.json — raw per-case artifacts so the analysis
    can be re-verified from the source data

The opinion synthesis plumbing from PR #2 held up perfectly — all 6
pilots got non-stub llama3.2:1b responses through the unique-constraint
+ shape-guarded persistence path. The small LLM hallucinated narratives
that matched the README (e.g. "rapid cascade in early_adopters stalls
against skeptic resistance") while the actual metrics showed every
community at 86-100% adoption. That's a separate hardening follow-up.

Next P1 task is the wire fix. Estimated: ~30 min CC, then a fresh pilot
round to verify. Regression tests in test_04_simulation_acceptance.py
will pin the outcome so this can't silently regress again.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(pilots): fix campaign framing wire + switch to GPU llama3.1:8b

The first pilot round in docs/USE_CASE_PILOTS.md found that every
Prophet simulation produced identical step-by-step trajectories
regardless of campaign framing — controversy=0.8 and controversy=0.2
both landed at final_adoption=0.973±0.001. Root cause (traced to
exact lines in the previous session): CampaignConfig.novelty and
.utility were read into CampaignEvent in step_runner.py and then
silently dropped before reaching the tick loop. Only .controversy was
forwarded, and it was forwarded as a method parameter that defaulted
to 0.0 and was never set by any caller. The entire campaign-framing
UI was effectively decoration.

This commit fixes the wire end-to-end across three layers, then
re-runs all six pilots on GPU to verify the fix.

## Wire fix (Round 8-6)

**1. community_orchestrator.py** — extract all three framing values
from the CampaignEvent and pass them into both AgentTick.tick() and
AgentTick.async_tick() alongside the existing campaign_controversy
forwarding.

**2. tick.py** — MessageStrength construction now blends:
    novelty  = 0.6 * campaign_novelty  + 0.4 * media_signal
    utility  = 0.6 * campaign_utility  + 0.4 * (evaluation_score / 2)
    controversy = campaign_controversy (pure campaign — it's the
                  objective polarising-ness of the message, not an
                  agent-perception quantity)
The 0.6/0.4 weights were tuned so a controversy=0.8 to controversy=0.2
swing produces a ~0.42 point delta in raw score (before clamp),
which is enough to move adoption 20+ points on the early steps.

**3. cognition.py** — Tier-1 rule engine gained a campaign_bonus term:
    bonus = 0.3 * (utility - 0.5) + 0.2 * (novelty - 0.5)
    evaluation += bonus * 2.0
This is centered at 0 for neutral campaigns so prior fixtures stay
green, but shifts evaluation_score by ±0.25 on extreme framings —
enough to move the ADOPT decision threshold meaningfully. evaluate()
and evaluate_async() both take new campaign_novelty + campaign_utility
parameters and the Tier-3 LLM fallback path also threads them through.

## Regression test

test_04_step_runner.py::TestCampaignFramingAffectsOutcome runs two
sims with identical seeds + populations but opposite framings
(friendly: novelty=0.85, utility=0.85, controversy=0.15 vs
hostile: novelty=0.15, utility=0.15, controversy=0.85) and asserts:
    abs(friendly.adoption_rate - hostile.adoption_rate) >= 0.02
    friendly.adoption_rate > hostile.adoption_rate

Without the wire fix the delta is 0.0000 (bit-identical). With the
fix it's +0.1817 at step 4, which would have caught the regression
immediately.

## Post-fix pilot deltas

| Pair | Pre-fix step-0 delta | Post-fix step-0 delta | Post-fix final delta |
|------|:---:|:---:|:---:|
| UC1 baseline -> reframed     | +0.000 | **+0.236** | +0.017 |
| UC2 Strategy B -> Strategy C | +0.000 | **+0.264** | +0.017 |
| UC3 raw -> restructured      | +0.000 | **+0.147** | **+0.185** |

UC3 raw is the clearest win — the hostile RTO mandate now produces
zero viral_cascade events and ends at 74.5% adoption vs 93.1% for
the restructured version. That's a real stall pattern, not just a
faster trajectory. UC1/UC2 still saturate at ~97% because the
1030-agent population crosses cascade critical mass even with
hostile framing; a 5K-10K run at the same weights would likely
produce sharper stalls.

## GPU + model upgrade (Round 8-6 stack changes)

 * Ollama moved to GPU mode via `docker-compose.gpu.yml` — RTX 4070
   SUPER 12 GiB runs llama3.1:8b at ~75 tok/s (CPU mode was ~4-8
   tok/s). Every agent tick + opinion synthesis now completes in
   sub-second wall time.
 * Default model upgraded from llama3.2:1b to llama3.1:8b across
   config.py, .env.example, docker-compose.yml, frontend/config/
   constants.ts and four test files. llama3.1:8b is large enough
   to stay anchored to the provided numeric evidence in the
   opinion-synthesis prompt; the 1B model hallucinated narratives
   matching the README claims instead of the actual metrics.
 * Opinion synthesis timeout reverted from 120s (CPU fallback) back
   to 30s now that GPU inference finishes in ~1-2s.
 * README + CLAUDE.md Quick Start section rewritten with GPU as the
   recommended path and CPU-only as a documented fallback with the
   env-var overrides to flip back to llama3.2:1b.

## Runner + artifacts

`backend/scripts/run_use_case_pilot.py` was retuned to use the
llama3.1:8b default. All six result blobs under
`docs/pilot_results/*.json` regenerated with post-fix trajectories.
`docs/USE_CASE_PILOTS.md` gained a "Post-fix results (Round 8-6)"
section with before/after tables and an updated follow-up list
(population scaling + campaign_bonus weight tuning for sharper
stalls + echo-chamber detector gap).

## Test + CI

 * Backend: `uv run pytest tests/ -q` → **1029 passed, 2 skipped**
   (+1 new regression test, no regressions across the suite)
 * The new test_04_step_runner.py::TestCampaignFramingAffectsOutcome
   is the guardrail for this fix — it would have caught the original
   wire gap immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
showjihyun added a commit that referenced this pull request Apr 13, 2026
…n, validation)

Two-pass code review found 11 issues across 6 backend files:

Critical:
- #1  registry._call_adapter: wrap raw str→LLMPrompt before adapter.complete()
- #2  persist_step retry: re-insert EmergentEvent rows on rollback retry
- #8  deps.py singletons: add threading.Lock + double-checked locking
- #9  load_steps: bound EmergentEvent query with step≤max + limit

Important:
- #3  MC endpoint: asyncio.wait_for(300s) + 504 on timeout
- #4  settings PUT: str() coercion on Chinese LLM provider fields
- #5  monte_carlo.py: remove fragile iscoroutine guard, plain await
- #6  _config_to_dict: dataclasses.asdict for community serialization
- #7  UUID parse: _safe_uuid try/except replaces len>8 heuristic
- #10 persist_step retry: also re-insert agent_states + propagation_events
- #11 settings PUT: str() coercion on Anthropic/OpenAI/Gemini fields too

All 57 targeted tests pass (test_29 + test_06 + test_05).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant