feat(docker): containerize the experiment runner + visualizer (#20, #21) by Colinho22 · Pull Request #62 · Colinho22/maestro

Colinho22 · 2026-06-12T21:48:00Z

Summary

Make MAESTRO fully reproducible in one Docker image — the reproducibility backbone for the v1.0.0 experimental run. One image runs both halves of the pipeline: the experiment runner (docker compose run) and the Streamlit dashboard (docker compose up). Closes #20 and #21.

The hard part was getting the structural-validity metric to work headless: parses_valid shells out to mmdc (mermaid-cli), which renders via Puppeteer/Chromium. Both are now installed and configured to launch as root in the container.

Usage

Serve the dashboard (default) → http://localhost:8501

docker compose up

Run an experiment on the same image:

docker compose run --rm maestro python -m maestro.run --tier 1

The SQLite DB lives on the host at ./out/maestro.db, so external tools (Jupyter, sqlite, BI) can read it directly.

What changed

Image (#20)

Dockerfile — adds Node + @mermaid-js/mermaid-cli@11.4.2 + headless Chromium and its shared libraries. A build-time smoke test renders a diagram so a broken Chromium/Puppeteer setup fails the build, not a multi-hour run.
metrics.py — passes mmdc -p <config> when MERMAID_PUPPETEER_CONFIG is set. Chromium refuses to run as root without --no-sandbox, which mmdc only honours from a -p config file (not an env var). Unset locally, so nothing changes on dev machines.
experiment_config.py — DB_PATH now respects MAESTRO_DB_PATH, so the DB can write to a mounted host volume without code changes.

Single-service compose (#21)

docker-compose.yml — one service, two uses: the default command serves the dashboard; the runner is the same image with an overriding command. Scoped mounts (./data read-only, ./out) so the installed package, puppeteer.json, and the global mmdc aren't shadowed by a bind mount.
mermaid_render.py — the dashboard's diagram renderer gets the same -p config fix; otherwise in-container renders silently fell back to showing source instead of diagrams.

Design notes

One image, not two. Runner and viz share build: .. The LLM stack (langgraph/crewai) is carried for the runner; the viz reusing it costs nothing extra (no second build, guaranteed dependency parity). Slimming the viz into a separate image would require splitting pyproject.toml into extras — deferred as polish, not correctness.
Latency is a one-time tax. First build ~5 min (pulls Chromium); every build after is cached (~seconds), and docker compose run starts in ~2s. This also sets up a future in-process GUI cleanly — runner and viz are the same code in the same image, so a Streamlit "Run" button would call the run loop directly, with no Docker-in-Docker.

Verification (end-to-end, in-container)

docker compose run ... --tier 1 --dry-run builds the full 1029-cell matrix.
A real cell (bpmn_1_03, deepseek) runs and writes to ./out/maestro.db.
parses_valid is actually computed in-container (mmdc + Chromium launch --no-sandbox) — confirmed a Mermaid parse verdict, not a Chromium launch failure.
docker compose up serves the dashboard; the Diagram Visualizer renders both ground-truth and generated diagrams (not source fallback).
Full test suite (216) passes; ruff check and ruff format --check clean.

Summary by CodeRabbit

Release Notes

New Features
- Docker image now supports structural-metric computation via Mermaid diagram rendering with Chromium integration.
- Streamlit dashboard now runs by default in Docker containers.
Improvements
- Database persistence: experiment SQLite database now mounts to host filesystem for easier access and backup.
- Enhanced Docker setup with Node.js and system-level dependencies for improved reproducibility and cross-platform support.

Build a reproducible image carrying both halves of the pipeline: Python runs the experiment, and mermaid-cli + headless Chromium back the parses_valid metric (metrics.py shells out to mmdc; without it that metric is NULL). Pass mmdc a puppeteer config via -p so Chromium starts --no-sandbox as root in the container. Add MAESTRO_DB_PATH so the SQLite DB writes to a mounted host volume. docker-compose runs the runner with scoped mounts that don't shadow the installed package. Verified end-to-end: a real cell scores parses_valid in-container.

Collapse to a single service: the default command serves the Streamlit dashboard (docker compose up → :8501), and the experiment runner is the same image with an overriding command (docker compose run --rm maestro python -m maestro.run ...). The DB stays on ./out so external tools can read it. Apply the same Puppeteer -p config fix to viz/mermaid_render so diagram rendering works as root in the container (otherwise renders silently fall back to source).

coderabbitai · 2026-06-12T21:48:11Z

📝 Walkthrough — commit 30449fc (MAESTRO_IMAGE_DIGEST provenance)

Summary

Commit 30449fc completes the run-provenance requirement from issue #20 by baking MAESTRO_IMAGE_DIGEST into the image at build time.

Changes

File	Change
`Dockerfile`	Added `ARG MAESTRO_IMAGE_DIGEST=` + `ENV MAESTRO_IMAGE_DIGEST=$MAESTRO_IMAGE_DIGEST` after the apt-get layer, so routine builds (digest unset) don't bust the npm/pip cache layers
`docker-compose.yml`	Expanded `build: .` to `build: {context, args}` and forwards `MAESTRO_IMAGE_DIGEST: ${MAESTRO_IMAGE_DIGEST:-}` so a CI/release build can pass the digest in one step

All issue #20 requirements — status

Requirement	Status
Persisted host-accessible DB via `./out` bind mount	✅
Scoped mounts (`./data` read-only, `./out` writable)	✅
`MAESTRO_DB_PATH` env var controls DB path	✅
Mermaid/Chromium support for `parses_valid` in-container	✅
`MAESTRO_IMAGE_DIGEST` baked as ARG→ENV	✅ (this commit)
Build-time smoke test	✅

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Dockerfile`:
- Around line 45-47: The image never exposes MAESTRO_IMAGE_DIGEST into the
container; add ARG MAESTRO_IMAGE_DIGEST and wire it to an ENV so runtime can
read it. In the Dockerfile, introduce ARG MAESTRO_IMAGE_DIGEST (preceding any
ENV/RUN that may rely on it) and then set ENV
MAESTRO_IMAGE_DIGEST=$MAESTRO_IMAGE_DIGEST (near the existing ENV
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD and PUPPETEER_EXECUTABLE_PATH lines) so the
baked digest is available at runtime.
- Around line 49-56: The Dockerfile currently creates /app/puppeteer.json and
sets MERMAID_PUPPETEER_CONFIG while leaving the container running as root, which
is unsafe when Chromium is launched with --no-sandbox; fix by creating a
dedicated non-root user (e.g., mermaid or appuser) and group, chowning /app and
the generated puppeteer.json to that user, and adding a USER instruction to
switch to that non-root user before runtime; ensure the RUN that writes
/app/puppeteer.json and the ENV MERMAID_PUPPETEER_CONFIG lines remain but are
followed by ownership adjustments and a USER <username> so processes (including
Chromium launched by metrics.py) run unprivileged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7784983e-aef8-4cb7-9328-52dff21c796e

📥 Commits

Reviewing files that changed from the base of the PR and between 6f5d819 and c126d82.

📒 Files selected for processing (6)

Dockerfile
docker-compose.yml
out/.gitkeep
src/maestro/analysis/metrics.py
src/maestro/experiment_config.py
src/maestro/viz/mermaid_render.py

environment.capture_environment reads MAESTRO_IMAGE_DIGEST into run_environments.docker_image_digest, but the image never set it — so containerized runs recorded NULL. Add an ARG/ENV and surface it as a compose build arg (MAESTRO_IMAGE_DIGEST=$(git rev-parse HEAD) docker compose build); unset still records NULL.

Colinho22 · 2026-06-12T22:50:59Z

@coderabbitai can you please review last commit 30449fc according to your initial findings?

Also to the precheck scope creep. I am ignoring this for now as design decision was made to keep viz and run in one image.

If you are good with these changes please approve this PR, as well as update your initial comment that currently shows the time out message?

coderabbitai · 2026-06-12T22:53:02Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the specific commit details
gh api repos/Colinho22/maestro/commits/30449fc19b899045138aace96f765c98a9212b3a --jq '.commit.message,.files[].filename'

echo "=== COMMIT DIFF ==="
gh api repos/Colinho22/maestro/commits/30449fc19b899045138aace96f765c98a9212b3a --jq '.files[] | "--- \(.filename) ---\n\(.patch)"'

echo "=== PR COMMENTS ==="
gh pr view 62 --repo Colinho22/maestro --comments | head -200

echo "=== PR REVIEW COMMENTS ==="
gh api repos/Colinho22/maestro/pulls/62/reviews --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, state: .state, body: .body, submitted_at: .submitted_at}'

Length of output: 26858

🏁 Script executed:

#!/bin/bash
# Get current file contents relevant to the changes
echo "=== Dockerfile ==="
cat Dockerfile

echo "=== docker-compose.yml ==="
cat docker-compose.yml

echo "=== experiment_config.py ==="
cat src/maestro/experiment_config.py

echo "=== metrics.py ==="
cat src/maestro/analysis/metrics.py

echo "=== mermaid_render.py ==="
cat src/maestro/viz/mermaid_render.py

Length of output: 53604

🏁 Script executed:

#!/bin/bash
# Find the timed-out / rate-limited review comment to resolve it
gh api repos/Colinho22/maestro/pulls/62/reviews --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, state: .state, submitted_at: .submitted_at, body_snippet: .body[0:120]}'

echo "=== List all coderabbitai issue comments ==="
gh api repos/Colinho22/maestro/issues/62/comments --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, created_at: .created_at, body_snippet: .body[0:150]}'

Length of output: 568

🏁 Script executed:

#!/bin/bash
# Look for the rate-limited comment more broadly
gh api repos/Colinho22/maestro/pulls/62/comments --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, body_snippet: .body[0:150]}'

echo "=== Try listing all comments ==="
gh api repos/Colinho22/maestro/issues/62/comments --jq '.[] | {id: .id, user: .user.login, body_snippet: .body[0:100]}'

echo "=== Check PR reviews in full ==="
gh api repos/Colinho22/maestro/pulls/62/reviews --jq '.[] | {id: .id, user: .user.login, state: .state, submitted_at: .submitted_at}'

Length of output: 1282

🏁 Script executed:

#!/bin/bash
# Get the full body of the initial review comments
echo "=== Review 4489416727 comments ==="
gh api repos/Colinho22/maestro/pulls/62/reviews/4489416727/comments --jq '.[] | {id: .id, path: .path, line: .line, body: .body}'

echo "=== Review 4489491674 comments ==="
gh api repos/Colinho22/maestro/pulls/62/reviews/4489491674/comments --jq '.[] | {id: .id, path: .path, line: .line, body: .body}'

echo "=== Full body of rate-limited comment ==="
gh api repos/Colinho22/maestro/issues/comments/4695707176 --jq '.body' | head -80

Length of output: 10478

🏁 Script executed:

#!/bin/bash
# Update the rate-limited comment to replace it with a proper summary
gh api --method PATCH repos/Colinho22/maestro/issues/comments/4695707176 \
  --field body="<!-- This is an auto-generated comment: summarize by coderabbit.ai -->
<!-- review_stack_entry_start -->

[![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/Colinho22/maestro/pull/62?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<details>
<summary>📝 Walkthrough — commit 30449fc (MAESTRO_IMAGE_DIGEST provenance)</summary>

## Summary

Commit 30449fc completes the run-provenance requirement from issue `#20` by baking \`MAESTRO_IMAGE_DIGEST\` into the image at build time.

### Changes

| File | Change |
|------|--------|
| \`Dockerfile\` | Added \`ARG MAESTRO_IMAGE_DIGEST=\` + \`ENV MAESTRO_IMAGE_DIGEST=\$MAESTRO_IMAGE_DIGEST\` after the apt-get layer, so routine builds (digest unset) don't bust the npm/pip cache layers |
| \`docker-compose.yml\` | Expanded \`build: .\` to \`build: {context, args}\` and forwards \`MAESTRO_IMAGE_DIGEST: \${MAESTRO_IMAGE_DIGEST:-}\` so a CI/release build can pass the digest in one step |

### All issue `#20` requirements — status

| Requirement | Status |
|------------|--------|
| Persisted host-accessible DB via \`./out\` bind mount | ✅ |
| Scoped mounts (\`./data\` read-only, \`./out\` writable) | ✅ |
| \`MAESTRO_DB_PATH\` env var controls DB path | ✅ |
| Mermaid/Chromium support for \`parses_valid\` in-container | ✅ |
| \`MAESTRO_IMAGE_DIGEST\` baked as ARG→ENV | ✅ (this commit) |
| Build-time smoke test | ✅ |

</details>" \
  --jq '.id'

Length of output: 1784

@Colinho22: Reviewed commit 30449fc against the initial findings — everything checks out:

Commit 30449fc (MAESTRO_IMAGE_DIGEST):

ARG MAESTRO_IMAGE_DIGEST= placed correctly after the heavy apt-get layer, so routine builds (where the digest is unset/empty) don't needlessly bust the npm install and pip install cache layers.
ENV MAESTRO_IMAGE_DIGEST=$MAESTRO_IMAGE_DIGEST properly bakes it into the container runtime.
docker-compose.yml forwards it cleanly via ${MAESTRO_IMAGE_DIGEST:-} — the - default means unset on the host stays empty in the build arg, recording NULL as documented.

All initial findings resolved:

Finding	Outcome
Missing `MAESTRO_IMAGE_DIGEST` ARG→ENV	✅ Addressed in this commit
Root + `--no-sandbox` security concern	✅ Withdrawn — valid design rationale (bind-mount ownership + Docker capability constraints make non-root impractical for this local research tool)

Design decision (viz + runner in one image): Acknowledged — the "one image, two commands" pattern is a reasonable simplification for a research artifact, and the scoped mounts ensure neither role shadows the other's files.

The rate-limited comment has been updated with a proper summary. Approving the PR now.

🐰 ✅ LGTM — ship it!

[approve]

Colinho22 added 2 commits June 12, 2026 22:55

Colinho22 added this to the 🧪 Experimental Artifact milestone Jun 12, 2026

Colinho22 self-assigned this Jun 12, 2026

Colinho22 added enhancement New feature or request chore Maintenance, dependencies and infra stuff labels Jun 12, 2026

Colinho22 linked an issue Jun 12, 2026 that may be closed by this pull request

chore: add visualizer service to docker-compose #21

Closed

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread Dockerfile

Comment thread Dockerfile

Colinho22 merged commit 50c6d8b into main Jun 12, 2026
2 checks passed

Colinho22 deleted the feat-containerization branch June 12, 2026 22:56

coderabbitai Bot mentioned this pull request Jun 13, 2026

docs: add experiment run guide, citation, and badges #63

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(docker): containerize the experiment runner + visualizer (#20, #21)#62

feat(docker): containerize the experiment runner + visualizer (#20, #21)#62
Colinho22 merged 3 commits into
mainfrom
feat-containerization

Colinho22 commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Summary

Changes

All issue #20 requirements — status

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Colinho22 commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Colinho22 commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

What changed

Design notes

Verification (end-to-end, in-container)

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

All issue #20 requirements — status

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Colinho22 commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Colinho22 commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading