Skip to content

feat(docker): containerize the experiment runner + visualizer (#20, #21)#62

Merged
Colinho22 merged 3 commits into
mainfrom
feat-containerization
Jun 12, 2026
Merged

feat(docker): containerize the experiment runner + visualizer (#20, #21)#62
Colinho22 merged 3 commits into
mainfrom
feat-containerization

Conversation

@Colinho22

@Colinho22 Colinho22 commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

Make MAESTRO fully reproducible in one Docker image — the reproducibility backbone for the v1.0.0 experimental run. One image runs both halves of the pipeline: the experiment runner (docker compose run) and the Streamlit dashboard (docker compose up). Closes #20 and #21.

The hard part was getting the structural-validity metric to work headless: parses_valid shells out to mmdc (mermaid-cli), which renders via Puppeteer/Chromium. Both are now installed and configured to launch as root in the container.

Usage

Serve the dashboard (default) → http://localhost:8501

docker compose up

Run an experiment on the same image:

docker compose run --rm maestro python -m maestro.run --tier 1

The SQLite DB lives on the host at ./out/maestro.db, so external tools (Jupyter, sqlite, BI) can read it directly.

What changed

Image (#20)

  • Dockerfile — adds Node + @mermaid-js/mermaid-cli@11.4.2 + headless Chromium and its shared libraries. A build-time smoke test renders a diagram so a broken Chromium/Puppeteer setup fails the build, not a multi-hour run.
  • metrics.py — passes mmdc -p <config> when MERMAID_PUPPETEER_CONFIG is set. Chromium refuses to run as root without --no-sandbox, which mmdc only honours from a -p config file (not an env var). Unset locally, so nothing changes on dev machines.
  • experiment_config.pyDB_PATH now respects MAESTRO_DB_PATH, so the DB can write to a mounted host volume without code changes.

Single-service compose (#21)

  • docker-compose.yml — one service, two uses: the default command serves the dashboard; the runner is the same image with an overriding command. Scoped mounts (./data read-only, ./out) so the installed package, puppeteer.json, and the global mmdc aren't shadowed by a bind mount.
  • mermaid_render.py — the dashboard's diagram renderer gets the same -p config fix; otherwise in-container renders silently fell back to showing source instead of diagrams.

Design notes

  • One image, not two. Runner and viz share build: .. The LLM stack (langgraph/crewai) is carried for the runner; the viz reusing it costs nothing extra (no second build, guaranteed dependency parity). Slimming the viz into a separate image would require splitting pyproject.toml into extras — deferred as polish, not correctness.
  • Latency is a one-time tax. First build ~5 min (pulls Chromium); every build after is cached (~seconds), and docker compose run starts in ~2s. This also sets up a future in-process GUI cleanly — runner and viz are the same code in the same image, so a Streamlit "Run" button would call the run loop directly, with no Docker-in-Docker.

Verification (end-to-end, in-container)

  • docker compose run ... --tier 1 --dry-run builds the full 1029-cell matrix.
  • A real cell (bpmn_1_03, deepseek) runs and writes to ./out/maestro.db.
  • parses_valid is actually computed in-container (mmdc + Chromium launch --no-sandbox) — confirmed a Mermaid parse verdict, not a Chromium launch failure.
  • docker compose up serves the dashboard; the Diagram Visualizer renders both ground-truth and generated diagrams (not source fallback).
  • Full test suite (216) passes; ruff check and ruff format --check clean.

Summary by CodeRabbit

Release Notes

  • New Features

    • Docker image now supports structural-metric computation via Mermaid diagram rendering with Chromium integration.
    • Streamlit dashboard now runs by default in Docker containers.
  • Improvements

    • Database persistence: experiment SQLite database now mounts to host filesystem for easier access and backup.
    • Enhanced Docker setup with Node.js and system-level dependencies for improved reproducibility and cross-platform support.

Build a reproducible image carrying both halves of the pipeline: Python runs the
experiment, and mermaid-cli + headless Chromium back the parses_valid metric
(metrics.py shells out to mmdc; without it that metric is NULL). Pass mmdc a
puppeteer config via -p so Chromium starts --no-sandbox as root in the
container. Add MAESTRO_DB_PATH so the SQLite DB writes to a mounted host volume.
docker-compose runs the runner with scoped mounts that don't shadow the
installed package. Verified end-to-end: a real cell scores parses_valid in-container.
Collapse to a single service: the default command serves the Streamlit
dashboard (docker compose up → :8501), and the experiment runner is the same
image with an overriding command (docker compose run --rm maestro python -m
maestro.run ...). The DB stays on ./out so external tools can read it. Apply
the same Puppeteer -p config fix to viz/mermaid_render so diagram rendering
works as root in the container (otherwise renders silently fall back to source).
@Colinho22 Colinho22 added this to the 🧪 Experimental Artifact milestone Jun 12, 2026
@Colinho22 Colinho22 self-assigned this Jun 12, 2026
@Colinho22 Colinho22 added enhancement New feature or request chore Maintenance, dependencies and infra stuff labels Jun 12, 2026
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough — commit 30449fc (MAESTRO_IMAGE_DIGEST provenance)

Summary

Commit 30449fc completes the run-provenance requirement from issue #20 by baking MAESTRO_IMAGE_DIGEST into the image at build time.

Changes

File Change
Dockerfile Added ARG MAESTRO_IMAGE_DIGEST= + ENV MAESTRO_IMAGE_DIGEST=$MAESTRO_IMAGE_DIGEST after the apt-get layer, so routine builds (digest unset) don't bust the npm/pip cache layers
docker-compose.yml Expanded build: . to build: {context, args} and forwards MAESTRO_IMAGE_DIGEST: ${MAESTRO_IMAGE_DIGEST:-} so a CI/release build can pass the digest in one step

All issue #20 requirements — status

Requirement Status
Persisted host-accessible DB via ./out bind mount
Scoped mounts (./data read-only, ./out writable)
MAESTRO_DB_PATH env var controls DB path
Mermaid/Chromium support for parses_valid in-container
MAESTRO_IMAGE_DIGEST baked as ARG→ENV ✅ (this commit)
Build-time smoke test

@Colinho22 Colinho22 linked an issue Jun 12, 2026 that may be closed by this pull request

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Dockerfile`:
- Around line 45-47: The image never exposes MAESTRO_IMAGE_DIGEST into the
container; add ARG MAESTRO_IMAGE_DIGEST and wire it to an ENV so runtime can
read it. In the Dockerfile, introduce ARG MAESTRO_IMAGE_DIGEST (preceding any
ENV/RUN that may rely on it) and then set ENV
MAESTRO_IMAGE_DIGEST=$MAESTRO_IMAGE_DIGEST (near the existing ENV
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD and PUPPETEER_EXECUTABLE_PATH lines) so the
baked digest is available at runtime.
- Around line 49-56: The Dockerfile currently creates /app/puppeteer.json and
sets MERMAID_PUPPETEER_CONFIG while leaving the container running as root, which
is unsafe when Chromium is launched with --no-sandbox; fix by creating a
dedicated non-root user (e.g., mermaid or appuser) and group, chowning /app and
the generated puppeteer.json to that user, and adding a USER instruction to
switch to that non-root user before runtime; ensure the RUN that writes
/app/puppeteer.json and the ENV MERMAID_PUPPETEER_CONFIG lines remain but are
followed by ownership adjustments and a USER <username> so processes (including
Chromium launched by metrics.py) run unprivileged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7784983e-aef8-4cb7-9328-52dff21c796e

📥 Commits

Reviewing files that changed from the base of the PR and between 6f5d819 and c126d82.

📒 Files selected for processing (6)
  • Dockerfile
  • docker-compose.yml
  • out/.gitkeep
  • src/maestro/analysis/metrics.py
  • src/maestro/experiment_config.py
  • src/maestro/viz/mermaid_render.py

Comment thread Dockerfile
Comment thread Dockerfile
environment.capture_environment reads MAESTRO_IMAGE_DIGEST into
run_environments.docker_image_digest, but the image never set it — so
containerized runs recorded NULL. Add an ARG/ENV and surface it as a compose
build arg (MAESTRO_IMAGE_DIGEST=$(git rev-parse HEAD) docker compose build);
unset still records NULL.
@Colinho22

Copy link
Copy Markdown
Owner Author

@coderabbitai can you please review last commit 30449fc according to your initial findings?

Also to the precheck scope creep. I am ignoring this for now as design decision was made to keep viz and run in one image.

If you are good with these changes please approve this PR, as well as update your initial comment that currently shows the time out message?

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the specific commit details
gh api repos/Colinho22/maestro/commits/30449fc19b899045138aace96f765c98a9212b3a --jq '.commit.message,.files[].filename'

echo "=== COMMIT DIFF ==="
gh api repos/Colinho22/maestro/commits/30449fc19b899045138aace96f765c98a9212b3a --jq '.files[] | "--- \(.filename) ---\n\(.patch)"'

echo "=== PR COMMENTS ==="
gh pr view 62 --repo Colinho22/maestro --comments | head -200

echo "=== PR REVIEW COMMENTS ==="
gh api repos/Colinho22/maestro/pulls/62/reviews --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, state: .state, body: .body, submitted_at: .submitted_at}'

Length of output: 26858


🏁 Script executed:

#!/bin/bash
# Get current file contents relevant to the changes
echo "=== Dockerfile ==="
cat Dockerfile

echo "=== docker-compose.yml ==="
cat docker-compose.yml

echo "=== experiment_config.py ==="
cat src/maestro/experiment_config.py

echo "=== metrics.py ==="
cat src/maestro/analysis/metrics.py

echo "=== mermaid_render.py ==="
cat src/maestro/viz/mermaid_render.py

Length of output: 53604


🏁 Script executed:

#!/bin/bash
# Find the timed-out / rate-limited review comment to resolve it
gh api repos/Colinho22/maestro/pulls/62/reviews --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, state: .state, submitted_at: .submitted_at, body_snippet: .body[0:120]}'

echo "=== List all coderabbitai issue comments ==="
gh api repos/Colinho22/maestro/issues/62/comments --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, created_at: .created_at, body_snippet: .body[0:150]}'

Length of output: 568


🏁 Script executed:

#!/bin/bash
# Look for the rate-limited comment more broadly
gh api repos/Colinho22/maestro/pulls/62/comments --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, body_snippet: .body[0:150]}'

echo "=== Try listing all comments ==="
gh api repos/Colinho22/maestro/issues/62/comments --jq '.[] | {id: .id, user: .user.login, body_snippet: .body[0:100]}'

echo "=== Check PR reviews in full ==="
gh api repos/Colinho22/maestro/pulls/62/reviews --jq '.[] | {id: .id, user: .user.login, state: .state, submitted_at: .submitted_at}'

Length of output: 1282


🏁 Script executed:

#!/bin/bash
# Get the full body of the initial review comments
echo "=== Review 4489416727 comments ==="
gh api repos/Colinho22/maestro/pulls/62/reviews/4489416727/comments --jq '.[] | {id: .id, path: .path, line: .line, body: .body}'

echo "=== Review 4489491674 comments ==="
gh api repos/Colinho22/maestro/pulls/62/reviews/4489491674/comments --jq '.[] | {id: .id, path: .path, line: .line, body: .body}'

echo "=== Full body of rate-limited comment ==="
gh api repos/Colinho22/maestro/issues/comments/4695707176 --jq '.body' | head -80

Length of output: 10478


🏁 Script executed:

#!/bin/bash
# Update the rate-limited comment to replace it with a proper summary
gh api --method PATCH repos/Colinho22/maestro/issues/comments/4695707176 \
  --field body="<!-- This is an auto-generated comment: summarize by coderabbit.ai -->
<!-- review_stack_entry_start -->

[![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/Colinho22/maestro/pull/62?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<details>
<summary>📝 Walkthrough — commit 30449fc (MAESTRO_IMAGE_DIGEST provenance)</summary>

## Summary

Commit 30449fc completes the run-provenance requirement from issue `#20` by baking \`MAESTRO_IMAGE_DIGEST\` into the image at build time.

### Changes

| File | Change |
|------|--------|
| \`Dockerfile\` | Added \`ARG MAESTRO_IMAGE_DIGEST=\` + \`ENV MAESTRO_IMAGE_DIGEST=\$MAESTRO_IMAGE_DIGEST\` after the apt-get layer, so routine builds (digest unset) don't bust the npm/pip cache layers |
| \`docker-compose.yml\` | Expanded \`build: .\` to \`build: {context, args}\` and forwards \`MAESTRO_IMAGE_DIGEST: \${MAESTRO_IMAGE_DIGEST:-}\` so a CI/release build can pass the digest in one step |

### All issue `#20` requirements — status

| Requirement | Status |
|------------|--------|
| Persisted host-accessible DB via \`./out\` bind mount | ✅ |
| Scoped mounts (\`./data\` read-only, \`./out\` writable) | ✅ |
| \`MAESTRO_DB_PATH\` env var controls DB path | ✅ |
| Mermaid/Chromium support for \`parses_valid\` in-container | ✅ |
| \`MAESTRO_IMAGE_DIGEST\` baked as ARG→ENV | ✅ (this commit) |
| Build-time smoke test | ✅ |

</details>" \
  --jq '.id'

Length of output: 1784



@Colinho22: Reviewed commit 30449fc against the initial findings — everything checks out:

Commit 30449fc (MAESTRO_IMAGE_DIGEST):

  • ARG MAESTRO_IMAGE_DIGEST= placed correctly after the heavy apt-get layer, so routine builds (where the digest is unset/empty) don't needlessly bust the npm install and pip install cache layers.
  • ENV MAESTRO_IMAGE_DIGEST=$MAESTRO_IMAGE_DIGEST properly bakes it into the container runtime.
  • docker-compose.yml forwards it cleanly via ${MAESTRO_IMAGE_DIGEST:-} — the - default means unset on the host stays empty in the build arg, recording NULL as documented.

All initial findings resolved:

Finding Outcome
Missing MAESTRO_IMAGE_DIGEST ARG→ENV ✅ Addressed in this commit
Root + --no-sandbox security concern ✅ Withdrawn — valid design rationale (bind-mount ownership + Docker capability constraints make non-root impractical for this local research tool)

Design decision (viz + runner in one image): Acknowledged — the "one image, two commands" pattern is a reasonable simplification for a research artifact, and the scoped mounts ensure neither role shadows the other's files.

The rate-limited comment has been updated with a proper summary. Approving the PR now.

🐰 ✅ LGTM — ship it!

[approve]

@Colinho22 Colinho22 merged commit 50c6d8b into main Jun 12, 2026
2 checks passed
@Colinho22 Colinho22 deleted the feat-containerization branch June 12, 2026 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Maintenance, dependencies and infra stuff enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: add visualizer service to docker-compose chore: finalize Docker setup for experiment runner

1 participant