Serve simulation under gunicorn (fix concurrent-sim slowdown) + DMP_API_URL alias by RyadT · Pull Request #11 · Delineo-Disease-Modeling/Simulation

RyadT · 2026-05-30T12:36:39Z

Why

Investigated a report that simulations feel ~2x slower on prod (covidmod.isi.jhu.edu) than localhost for the same Barnsdall zone (74002 / min-pop 5000 / greedy-weight, ~7000 people, 720h month-long run).

Measured findings:

The simulator compute is the same speed on prod and local; the in-process DMP is confirmed working on both (timeline_source_counts = {dmp: 24-26k, fallback: 0} on every prod run — not the slow per-infection HTTP path).
The ~2x appears under concurrent load. The Flask dev-server entrypoint (app.run(threaded=True)) runs CPU-bound sims on threads that share one GIL, so overlapping sims serialize. Localhost is single-user; prod is a shared multi-user host, so the contention shows there.

Reproduced directly: two concurrent 720h runs on the Flask server took 56s & 59s each (vs ~31s solo).

What

gunicorn entrypoint (Dockerfile, requirements.txt): run the sim server under gunicorn with multiple worker processes instead of the Flask dev server. CPU-bound sims now run on separate cores instead of serializing on one GIL.
- WEB_CONCURRENCY controls worker count (default 4; override per host).
- --timeout 0 because each /simulation/ request streams SSE for the full run duration — a non-zero timeout would kill long runs mid-stream.
DMP env-var alias (simulator/config.py): accept DMP_API_URL as an alias for DMP_API_BASE_URL. The deploy compose sets DMP_API_URL, so without this the HTTP-fallback base URL silently defaulted to localhost:8000 (nothing listens there in the sim container) and could never reach the real dmp service if the in-process DMP ever became unavailable. In-process path is unaffected; this only repairs the fallback target. Backward-compatible (does not require any Deploy change).

Verification

Ran the app under gunicorn --worker-class sync --timeout 0 with WEB_CONCURRENCY=2 locally:

	Flask dev server (old)	gunicorn 2 workers (new)
Solo 720h	~31-40s	40s
2 concurrent 720h	56s & 59s (~2x)	41s & 44s (~1x)

SSE progress streaming confirmed working through gunicorn; single-sim latency unchanged.

Notes

Single-sim latency is single-core-bound and does not change; this fixes throughput/latency under concurrency.
The deploy-entrypoint test (tests/test_deploy_entrypoints.py) only asserts imports, so it is unaffected by the CMD change.

🤖 Generated with Claude Code

The Flask dev-server entrypoint (app.run, threaded=True) serializes concurrent CPU-bound sims on a single GIL: two overlapping 720h runs each slowed from ~31s solo to ~56-59s (~2x). Switch the container entrypoint to gunicorn with multiple worker processes so concurrent sims run on separate cores. Verified locally: 2 concurrent 720h runs stay ~41-44s each instead of doubling. WEB_CONCURRENCY sets the worker count; --timeout 0 is required because each /simulation/ request streams Server-Sent Events for the full run duration. Also accept DMP_API_URL as an alias for DMP_API_BASE_URL: the deploy compose sets DMP_API_URL, so without the alias the HTTP fallback base_url silently defaulted to localhost:8000 (nothing listens there in the sim container) and could never reach the real dmp service if the in-process DMP became unavailable. The in-process path is unaffected; this only fixes the fallback target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serve simulation under gunicorn (fix concurrent-sim slowdown) + DMP_API_URL alias#11

Serve simulation under gunicorn (fix concurrent-sim slowdown) + DMP_API_URL alias#11
RyadT wants to merge 1 commit into
mainfrom
ryad/sim-prod-perf-dmp-fix

RyadT commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

RyadT commented May 30, 2026

Why

What

Verification

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant