chore: cut v1.0.0 release before final thesis experimental run

## Motivation

Once MAESTRO is ready to run the experimental matrix that produces the thesis data the code should be frozen so it becomes a citable release so:

1. **The thesis can reference an exact, named code state** — "the data in Table 3 was produced by MAESTRO v1.0.0" is something a reader can verify, unlike a bare commit SHA.
2. **The `run_environments.git_commit` column already records the SHA per row** — pairing that with a human-readable tag closes the provenance loop. Anyone can cross-check the DB against the GitHub release.
3. **Post-run code cleanup, small improvements, and CodeRabbit refactors don't pollute the experimental codebase** — they happen on `main` after v1.0.0, while the thesis points at the frozen tag.

## Pre-flight checklist (before tagging)

- [x] All open feature/chore issues that affect experimental behaviour are closed
- [x] `main` is up to date, `git status` is clean (no uncommitted work — the `git_dirty=1` warning we built in must be silent on the run)
- [x] `ruff check .` + `ruff format --check .` both clean
- [x] `pytest -v` green
- [x] `python -m maestro.run --dry-run` produces the expected matrix shape
- [x] `docker build .` succeeds and the container runs `--dry-run` correctly
- [x] Input corpus is finalised (data files committed under `data/`, tier classifications confirmed against proposal §3)
- [x] `MODELS` list reflects the final set of providers you intend to evaluate
- [x] `DEFAULT_REPEATS` reflects the final statistical-power decision

## Release procedure

```bash
# 1. Final verification on clean main
git checkout main && git pull
git status                       # must be clean
pytest -v                        # all green
docker build .                   # builds cleanly

# 2. Tag the commit
git tag -a v1.0.0 -m "Thesis experimental run — code freeze"
git push origin v1.0.0

# 3. Create the GitHub release with notes (gh CLI)
gh release create v1.0.0 \
  --title "v1.0.0 — Thesis experimental run" \
  --notes-file release-notes-v1.0.0.md

# 4. Run the experiment (containerised, on the tagged code)
docker run ... python -m maestro.run

# 5. After the run, attach the final maestro.db (or its SHA-256) as a release asset
sha256sum maestro.db > maestro.db.sha256
gh release upload v1.0.0 maestro.db.sha256
# Optional: gh release upload v1.0.0 maestro.db  (if size + privacy allow)
```

## Release notes template (release-notes-v1.0.0.md)

### MAESTRO v1.0.0 — Thesis experimental run

Code state that produced the experimental data reported in the MAESTRO thesis (Multi-Agent Evaluation for Structured Relational Output, FHGR FS26).

### What this release contains

- Four orchestration strategies under test: SingleAgent, SOP, CrewAI, LangGraph
- Three control conditions: NullControl, CopyInputControl, GroundTruthEchoControl
- N-LLM providers: Provider A (model 1; model n), ...
- Full evaluation pipeline: structural validity (via mmdc), entity F1 (id/name/lemma), relationship F1 (relaxed/strict), error taxonomy
- Reproducibility instrumentation: per-invocation environment capture (OS, arch, Python, git commit, lib versions), per-call retry counts, control-condition sanity floors and ceiling

### Verifying the data was produced by this code

Every row in `maestro.db` is provenance-stamped via the `environment_id` foreign key into `run_environments`. The `git_commit` column on that table must equal the commit this release tag points at:

```bash
sqlite3 maestro.db "SELECT DISTINCT git_commit FROM run_environments"
# returns: <SHA>
git rev-parse v1.0.0
# returns: <SHA>  — must match
```

`git_dirty` should be `0` for every row in the final dataset (rows with `git_dirty=1` are from dev iterations, not the experimental run).

### How to reproduce

```bash
git clone https://github.com/Colinho22/maestro.git
cd maestro
git checkout v1.0.0
# Configure API keys in .env (see .env.template)
docker compose up runner
```

## Scope

### In scope:
- Pre-flight checklist verification
- Git tag + GitHub release creation
- Release notes written
- SHA-256 of the final maestro.db attached

### Out of scope:
- Patches / cleanup after the data is gathered → those go to `v1.0.1+` on main, separately. The `v1.0.0` tag stays frozen.
- Attaching the raw maestro.db itself if it contains anything sensitive — start with the SHA-256 only, decide on the full upload separately.

## If the run reveals a bug

Tag v1.0.0 stays frozen. Two paths:

1. Bug is in the runner (data is salvageable, run was incomplete): fix on `main`, tag `v1.0.1`, rerun. Thesis cites `v1.0.1`.
2. Bug is in the strategy or metric code (results are biased / wrong): same — fix, tag `v1.0.1`, rerun, cite `v1.0.1`.
 
Resist the temptation to retroactively edit `v1.0.0`. The whole point is the immutable reference.

## Notes
- `git_dirty=1` rows in the current `maestro.db` (the 8-row sop_based test run) are from dev iterations. The final thesis run produces a separate clean dataset where every row has `git_dirty=0`.
- Consider also tagging a `v1.0.0-rc.1` candidate before the real run, to confirm Docker + the matrix produce sane output on a small subset (e.g. `--repeats 1`) before committing to the full multi-hour run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: cut v1.0.0 release before final thesis experimental run #43

Motivation

Pre-flight checklist (before tagging)

Release procedure

Release notes template (release-notes-v1.0.0.md)

MAESTRO v1.0.0 — Thesis experimental run

What this release contains

Verifying the data was produced by this code

How to reproduce

Scope

In scope:

Out of scope:

If the run reveals a bug

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

chore: cut v1.0.0 release before final thesis experimental run #43

Description

Motivation

Pre-flight checklist (before tagging)

Release procedure

Release notes template (release-notes-v1.0.0.md)

MAESTRO v1.0.0 — Thesis experimental run

What this release contains

Verifying the data was produced by this code

How to reproduce

Scope

In scope:

Out of scope:

If the run reveals a bug

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions