Skip to content

fix: make check_mermaid_valid work on Windows (replace /dev/stdin + /dev/null) #44

@Colinho22

Description

@Colinho22

Motivation

maestro.analysis.metrics.check_mermaid_valid is the function that validates whether a generated diagram parses as Mermaid (via the mmdc CLI). Its result populates metric_results.parses_valid for every successful run — a core column for both the metric pipeline and the GroundTruthEchoControlStrategy ceiling test.

The current implementation uses two Unix-only paths:

result = subprocess.run(
    [mmdc, "-i", "/dev/stdin", "-o", out.name, "-e", "png"],
    input=diagram_code,
    ...
)

There is already an in-source NOTE: Windows-incompatible comment acknowledging this, but the workaround was deferred to "when/if Windows matters".

It now matters. Thesis reviewers are expected to verify reproducibility on Windows, and the institute may pick up the artifact for downstream benchmark work. On Windows:

  • /dev/stdin does not exist → mmdc fails with a file-not-found error
  • tempfile.NamedTemporaryFile(delete=True) holds the file open exclusively → mmdc cannot write the out PNG even if the input path worked

The result is that every parses_valid column would be False on a Windows run, with a path error in parse_error. That contaminates every metric row that uses validity as a filter, and breaks the GroundTruthEchoControl ceiling invariant (it would no longer hit parses_valid=True).

Proposal

Replace both /dev/stdin and the auto-deleted temp file with explicit, cross-platform temp files. mmdc accepts a file path for -i just as readily as it does /dev/stdin.

def check_mermaid_valid(diagram_code: str) -> tuple[bool | None, str | None]:
    """
    ... (existing docstring)
    """
    mmdc = shutil.which("mmdc")
    if mmdc is None:
        return (None, "mmdc not found — validation skipped")

    # Cross-platform: write the diagram to a real temp file (avoids
    # /dev/stdin which doesn't exist on Windows), let mmdc render to
    # another temp file (avoids /dev/null and NamedTemporaryFile's
    # Windows exclusive-lock behaviour). Both files are explicitly
    # cleaned up in a finally block.
    in_fd, in_path = tempfile.mkstemp(suffix=".mmd")
    out_fd, out_path = tempfile.mkstemp(suffix=".png")
    os.close(in_fd)
    os.close(out_fd)
    try:
        Path(in_path).write_text(diagram_code, encoding="utf-8")
        result = subprocess.run(
            [mmdc, "-i", in_path, "-o", out_path, "-e", "png"],
            capture_output=True,
            text=True,
            timeout=15,
        )
        if result.returncode == 0:
            return (True, None)
        return (False, result.stderr.strip()[:500])
    except subprocess.TimeoutExpired:
        return (False, "mmdc timed out after 15s")
    except Exception as e:
        return (False, str(e)[:500])
    finally:
        # Best-effort cleanup; don't let teardown errors mask the result.
        for p in (in_path, out_path):
            try:
                Path(p).unlink(missing_ok=True)
            except OSError:
                pass

Notes on the approach:

  • tempfile.mkstemp returns a file descriptor + path. Closing the fd immediately and writing via a separate Path.write_text avoids both the Windows exclusive-lock issue and the "did the fd actually get closed before subprocess opens it" race that the with NamedTemporaryFile pattern has.
  • We subprocess.run without input=, since we're now passing a real file path.
  • The .mmd suffix on the input file is more accurate than no suffix; mmdc uses it for diagram-type heuristics in some versions.
  • The output file is created so mmdc has somewhere to write, then immediately deleted. We never read it (we only care about the exit code), but Windows requires the file to exist before subprocess opens it.

Scope

In scope:

  • Rewrite of check_mermaid_valid per the proposal above
  • Update / extend the existing test in tests/analysis/test_metrics.py:
    • The test_ground_truth_echo_scores_perfect_ceiling test already exercises the success path; verify it still passes
    • Add a test for the parse-failure path (input that should fail mmdc) — not mermaid at all is already in the parametrized test_zero_entities_in_output_never_crashes, but that test doesn't assert on parses_valid. New explicit test welcome.
  • Drop the existing NOTE: Windows-incompatible comment from the docstring (it'll no longer be true)

Out of scope:

  • Adding Windows to CI matrix — that's a bigger conversation (the ubuntu-latest runner gives the project its Linux baseline; verifying Windows would need a separate runner and adds CI minutes)
  • Manual Windows-machine verification — filing this as a follow-up, but the change is mechanically correct enough to ship without it as long as Linux/macOS pytest still passes

Acceptance criteria

  • check_mermaid_valid no longer references /dev/stdin or /dev/null
  • Both input and output paths come from tempfile.mkstemp with explicit .unlink(missing_ok=True) in a finally block
  • ruff check . + ruff format --check . clean
  • pytest -v green on the existing 20 tests (especially test_ground_truth_echo_scores_perfect_ceiling which exercises the success path through this function)
  • At least one new test asserting parses_valid=False for known-bad input (the negative path through the new code)
  • The Windows-incompatible NOTE comment in metrics.py is removed
  • python -m maestro.run --strategy ground_truth_control --dry-run still produces 1 row (no regression in the matrix builder)

Risk

Small. The function has one clear responsibility (call mmdc, return validity), no callers depend on the temp-file mechanics, and the test suite already covers the happy path with the real ground-truth file. The cleanup-in-finally pattern is standard Python.

Main thing to watch: ensure the out_path is created (zero-byte) on disk before subprocess starts, since mmdc on some platforms / versions checks existence rather than opening with O_CREAT. mkstemp creates the file as part of its contract, so we're fine — but worth a sanity check via the existing ceiling test.

Verification path

  • pytest -v on the dev machine (Linux/macOS) — all 20 tests pass, same as before.
  • Optional: on a Windows machine with mmdc on PATH, run the same pytest invocation. The ceiling test should produce parses_valid=True for the bpmn ground truth. Without this PR, it would have produced False with a /dev/stdin not found error.

Related

  • src/maestro/analysis/metrics.py:check_mermaid_valid (the function being fixed)
  • tests/analysis/test_metrics.py:test_ground_truth_echo_scores_perfect_ceiling (already exercises the happy path)
  • PR #42 / Issue #14 (tzdata for Windows in PR #42 was the first Windows-compat fix; this is the second)
  • Issue for v1.0.0 (chore: cut v1.0.0 release before final thesis experimental run #43 )release tagging — this PR should land before v1.0.0 since reviewer Windows compatibility is a thesis-defense concern
  • Issues #20 / #21 (Docker setup) — Docker is the belt; this PR is the braces. A Windows reviewer who can install Docker is covered by chore: finalize Docker setup for experiment runner #20; a Windows reviewer who runs pytest natively to sanity-check the test suite is covered by this PR.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions