-
Notifications
You must be signed in to change notification settings - Fork 0
feat(docker): containerize the experiment runner + visualizer (#20, #21) #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,21 +1,81 @@ | ||
| # MAESTRO - Multi-Agent Evaluation for Structured Relational Output | ||
| # Dockerfile for cross-platform reproducibility | ||
| # MAESTRO — Multi-Agent Evaluation for Structured Relational Output | ||
| # Dockerfile for cross-platform reproducibility. | ||
| # | ||
| # The image carries both halves of the pipeline: | ||
| # * Python 3.11 — runs the experiment (models, strategies, scoring, DB). | ||
| # * mermaid-cli (mmdc) + Chromium — backs the structural-validity metric | ||
| # (analysis/metrics.py shells out to `mmdc` to compute parses_valid; without | ||
| # it that metric is recorded as NULL for every run). | ||
|
|
||
| FROM python:3.11-slim | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # Install system dependencies | ||
| # System dependencies: | ||
| # * git — environment.py records the commit hash per run | ||
| # * nodejs / npm — runtime for mermaid-cli | ||
| # * chromium — mmdc renders via Puppeteer, which needs a browser | ||
| # * the lib*/fonts* — shared libraries Chromium needs to start headless | ||
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| git \ | ||
| nodejs \ | ||
| npm \ | ||
| chromium \ | ||
| fonts-liberation \ | ||
| libasound2 \ | ||
| libatk-bridge2.0-0 \ | ||
| libatk1.0-0 \ | ||
| libcups2 \ | ||
| libdbus-1-3 \ | ||
| libdrm2 \ | ||
| libgbm1 \ | ||
| libgtk-3-0 \ | ||
| libnspr4 \ | ||
| libnss3 \ | ||
| libxcomposite1 \ | ||
| libxdamage1 \ | ||
| libxfixes3 \ | ||
| libxkbcommon0 \ | ||
| libxrandr2 \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Copy project files | ||
| # Image digest for run provenance. An image cannot know its own digest at | ||
| # build time, so it is passed in (e.g. the CI-resolved digest or the git SHA) | ||
| # and baked as an env var; environment.capture_environment() reads it into | ||
| # run_environments.docker_image_digest. Unset → recorded as NULL. | ||
| ARG MAESTRO_IMAGE_DIGEST= | ||
| ENV MAESTRO_IMAGE_DIGEST=$MAESTRO_IMAGE_DIGEST | ||
|
|
||
| # mermaid-cli, pinned for reproducibility. Puppeteer must use the system | ||
| # Chromium (installed above) rather than downloading its own — and Chromium | ||
| # refuses to run as root without --no-sandbox, which is the norm in CI/Docker. | ||
| ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \ | ||
| PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium | ||
| RUN npm install -g @mermaid-js/mermaid-cli@11.4.2 | ||
|
|
||
| # A Puppeteer launch config so mmdc starts Chromium headless without a sandbox. | ||
| # Chromium refuses to run as root without --no-sandbox; --disable-dev-shm-usage | ||
| # avoids crashes from the small /dev/shm Docker allocates by default. mmdc only | ||
| # honours these via a config file passed with `-p`, so metrics.py reads this | ||
| # path from MERMAID_PUPPETEER_CONFIG and forwards it as `-p`. | ||
| RUN printf '{"args":["--no-sandbox","--disable-gpu","--disable-dev-shm-usage"]}' \ | ||
| > /app/puppeteer.json | ||
| ENV MERMAID_PUPPETEER_CONFIG=/app/puppeteer.json | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
|
|
||
| # Python project. Copy metadata first so the dependency layer caches across | ||
| # source-only changes. | ||
| COPY pyproject.toml README.md ./ | ||
| COPY src/ ./src/ | ||
|
|
||
| # Install Python dependencies | ||
| RUN pip install --no-cache-dir -e . | ||
|
|
||
| # Default command | ||
| # Sanity: fail the build if mmdc can't actually render, so a broken | ||
| # Chromium/Puppeteer setup is caught here, not 80% into a real run. Uses a temp | ||
| # file (not /dev/stdin) so this checks the browser, not the input path; -p makes | ||
| # the launch config explicit rather than relying on env discovery. | ||
| RUN printf 'flowchart LR\n a["A"] --> b["B"]\n' > /tmp/smoke.mmd \ | ||
| && mmdc -p /app/puppeteer.json -i /tmp/smoke.mmd -o /tmp/smoke.png -e png \ | ||
| && rm -f /tmp/smoke.mmd /tmp/smoke.png | ||
|
|
||
| # Default: print the version. Override with the experiment runner, e.g. | ||
| # docker compose run --rm maestro python -m maestro.run --tier 1 | ||
| CMD ["python", "-c", "import maestro; print(f'MAESTRO v{maestro.__version__}')"] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,44 @@ | ||
| services: | ||
| # One image, two uses: | ||
| # * `docker compose up` → serves the Streamlit dashboard (default), | ||
| # reading the experiment DB read-write so a | ||
| # future in-app run can write to it. | ||
| # * `docker compose run --rm maestro python -m maestro.run --tier 1` | ||
| # → runs an experiment on the SAME image; the | ||
| # command overrides the dashboard default. | ||
| # The DB lives on the host at ./out so external tools (Jupyter, sqlite, BI) | ||
| # can read it directly. | ||
| maestro: | ||
| build: . | ||
| build: | ||
| context: . | ||
| args: | ||
| # Optional run-provenance stamp baked into the image, surfaced in | ||
| # run_environments.docker_image_digest. Pass at build time, e.g. | ||
| # MAESTRO_IMAGE_DIGEST=$(git rev-parse HEAD) docker compose build | ||
| # Unset → recorded as NULL. | ||
| MAESTRO_IMAGE_DIGEST: ${MAESTRO_IMAGE_DIGEST:-} | ||
| container_name: maestro | ||
| volumes: | ||
| - .:/app | ||
| env_file: | ||
| - .env | ||
| # Override default command for interactive use | ||
| # command: python -m maestro | ||
| environment: | ||
| # SQLite DB in the mounted output dir so results survive teardown and are | ||
| # reachable from the host. Both the runner (experiment_config.DB_PATH) and | ||
| # the dashboard (viz/settings.py) read this same var. | ||
| MAESTRO_DB_PATH: /app/out/maestro.db | ||
| ports: | ||
| - "8501:8501" | ||
| volumes: | ||
| # Scoped mounts (not the whole repo) so the image's installed package, | ||
| # /app/puppeteer.json, and the globally-installed mmdc are not shadowed. | ||
| - ./data:/app/data:ro # benchmark inputs + ground truth (read-only) | ||
| - ./out:/app/out # experiment DB, persisted + host-accessible | ||
| # Default command = the dashboard, so `docker compose up` just serves it. | ||
| # --server.address=0.0.0.0 lets the host browser reach the container; | ||
| # headless disables Streamlit's browser-open and first-run prompt. | ||
| command: | ||
| - streamlit | ||
| - run | ||
| - src/maestro/viz/app.py | ||
| - --server.address=0.0.0.0 | ||
| - --server.port=8501 | ||
| - --server.headless=true |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.