Colinho22 · Colinho22 · Jun 12, 2026 · Jun 12, 2026 · Jun 12, 2026 · Jun 12, 2026
diff --git a/.claude/SESSION_RECAP_2026-05-06.md b/.claude/SESSION_RECAP_2026-05-06.md
@@ -0,0 +1,65 @@
+# Session Recap — 2026-05-06
+
+## Goal
+Add Mistral and Gemini providers to the experiment matrix. Strategies (CrewAI, LangGraph) deferred to next session.
+
+## What happened, in order
+
+1. **Branch:** Created `feat/expand-strategies-and-providers` off `main`.
+2. **Mistral + Gemini providers added:**
+   - `src/maestro/providers/mistral.py` (using `mistralai` SDK — note v2.x reorganized imports to `mistralai.client.Mistral`)
+   - `src/maestro/providers/gemini.py` (using `google-genai` SDK)
+   - Wired into `providers/__init__.py`, `experiment_config.py:MODELS`, `run.py:_create_provider`
+   - `.env.template` updated with `MISTRAL_API_KEY` + `GEMINI_API_KEY`
+   - `pyproject.toml` deps: `mistralai>=1.5`, `google-genai>=1.0`
+   - `coderabbit.yaml` got naming-convention rules for `*Provider` / `*Strategy` suffixes
+   - Models: `mistral-small-2603` ($0.15/$0.60), `gemini-2.5-flash-lite` ($0.10/$0.40)
+3. **Bug surfaced:** Gemini SOP runs failed with `json.loads()` error on step 1. Root cause: provider `SYSTEM_PROMPT` hardcoded to "Mermaid only" leaks into SOP intermediate steps that ask for JSON. Smaller models (Gemini) follow the system prompt strictly; larger models tolerated the mismatch.
+4. **Bug fix split off:** Stashed feature work, branched `fix/sop-system-prompt-decoupling` off `main`. Added optional `system_prompt: str | None = None` parameter to `LLMProvider.complete()` and all concrete providers. SOP strategy now passes a JSON-extraction system prompt for steps 1 and 2; step 3 keeps the provider default. Smoke-tested OpenAI + Anthropic, no regressions.
+5. **Fix PR (#10) opened, reviewed, merged.** CodeRabbit suggested adding a regression test — declined for that PR because (a) repo has no test infrastructure yet, (b) the proposed assertion was mechanically wrong (it suggested asserting on the user-prompt content, but `system_prompt` is a separate kwarg). Tracked as a follow-up chore.
+6. **Repo hygiene PR (#11):** Added `.github/ISSUE_TEMPLATE/` (bug, feature, chore + config.yml). Merged.
+7. **GitHub project structure set up:**
+   - Milestones: `experimental-artefact` (due 2026-06-01) and `analysis` (due 2026-07-10)
+   - Labels: GitHub defaults + custom `chore`
+   - Test-setup follow-up issue created and assigned to `experimental-artefact`
+8. **Feature branch resumed:** Merged `main` into `feat/expand-strategies-and-providers` (clean, no conflicts), stash popped. Patched Mistral + Gemini providers to accept the new `system_prompt` parameter. Smoke-tested both — 4/4 success including the previously-failing Gemini SOP case.
+9. **Feature commit landed locally** as `feat: add Mistral and Gemini providers`. **NOT yet pushed or merged** — user closed session before pushing.
+
+## Current state
+
+**Branch:** `feat/expand-strategies-and-providers` — has one local commit ahead of remote, not yet pushed.
+
+**Working tree:** clean apart from gitignored `*.db` files (intentional — user will delete these before final experiment runs).
+
+**Open PRs:** none.
+
+**Pending issues on GitHub:** test-setup chore (`experimental-artefact` milestone).
+
+## Next session — pick up here
+
+1. **Push the feature branch:**
+   ```
+   git push -u origin feat/expand-strategies-and-providers
+   ```
+
+2. **Open the feature PR** with the body drafted in the prior session (saved in conversation history). Title: `feat: add Mistral and Gemini providers`. Mark Ready for review.
+
+3. **CodeRabbit review:** wait, triage, reply or fix. If rate-limited (1 review/hour), self-review and merge anyway since the PR was smoke-tested live.
+
+4. **After merge:** delete the local branch, sync `main`, then start the strategies work.
+
+5. **Strategies branch (next):** new branch `feat/add-crewai-and-langgraph-strategies` off `main`. User decided to bundle both CrewAI + LangGraph in a single PR rather than split (reasonable for related work; revisit if either turns out to be multi-day surprise complexity).
+
+## Key decisions / lessons captured to memory
+
+- User runs git commands themselves — give commands as text, don't execute via Bash.
+- PR granularity: don't push splitting unless there's a concrete benefit (time-to-merge, dependency ordering, reviewer cognitive load).
+- Branch naming convention: `feat/<desc>`, `fix/<desc>`, `chore/<desc>` matching Conventional Commits.
+- Bugs found mid-feature → separate `fix/` branch off `main`, merge first, then absorb back into the feature branch.
+
+## Open questions / deferred work
+
+- Test infrastructure: zero tests exist yet. Tracked as a chore on GitHub. Set up `tests/` + `conftest.py` + first regression test before final experiment runs.
+- `gemini-2.5-flash-lite` is a stable alias, not a dated snapshot. Swap to a dated form before the final experiment runs for thesis-grade reproducibility.
+- `*.db` not in `.gitignore` — currently 7 SQLite files show as untracked. User plans to delete them manually before the final experiment, but adding them to `.gitignore` would be cleaner.
+- Frontier model addition for the experiment — user wants one frontier model + one cheap model per provider family to compare model depth, but only cheap models are currently registered.
diff --git a/.claude/maestro_tier_rebalance_worksheet.md b/.claude/maestro_tier_rebalance_worksheet.md
@@ -0,0 +1,56 @@
+# MAESTRO — Tier Re-balancing Worksheet (entity counts under Phase 0 contract)
+*Generated 2026-06-11. Updated 2026-06-12 after rebalance.*
+*entity = inline-drawn node; group = subgraph (pool/lane/boundary/expanded sub-process).*
+*Goal: each category needs 5 diagrams in each tier. Bands (fixed): t1 `<10`, t2 `10–25`, t3 `>25` entities.*
+
+## BPMN
+
+| file | assigned tier | entities (new) | groups | entities+groups | tier by entities | in band? | note |
+|---|---|---|---|---|---|---|---|
+| 01_bpmn_1 | t1 (<10) | **5** | 0 | 5 | t1 | ✅ |  |
+| 02_bpmn_1 | t1 (<10) | **8** | 0 | 8 | t1 | ✅ |  |
+| 03_bpmn_1 | t1 (<10) | **8** | 0 | 8 | t1 | ✅ |  |
+| 04_bpmn_1 | t1 (<10) | **9** | 0 | 9 | t1 | ✅ | rebalanced: dropped task_2, rewired sub_process via boundary events only |
+| 05_bpmn_1 | t1 (<10) | **9** | 0 | 9 | t1 | ✅ | rebalanced: dropped employee-not-found branch, notify tasks, update tasks; merged approved ends |
+| 11_bpmn_2 | t2 (10–25) | **15** | 6 | 21 | t2 | ✅ |  |
+| 12_bpmn_2 | t2 (10–25) | **21** | 5 | 26 | t2 | ✅ |  |
+| 13_bpmn_2 | t2 (10–25) | **23** | 3 | 26 | t2 | ✅ | rebalanced: dropped IT/Payroll/Facilities pools + 3 icatch events + parallel split/merge in Responsible Dept |
+| 14_bpmn_2 | t2 (10–25) | **24** | 4 | 28 | t2 | ✅ | rebalanced: dropped connected-clients subprocess + call activity, replaced corporate rework with B2B referral, dropped document_risk + degenerate merge gateways |
+| 15_bpmn_2 | t2 (10–25) | **23** | 3 | 26 | t2 | ✅ |  |
+| 21_bpmn_3 | t3 (>25) | **29** | 5 | 34 | t3 | ✅ |  |
+| 22_bpmn_3 | t3 (>25) | **30** | 6 | 36 | t3 | ✅ |  |
+| 23_bpmn_3 | t3 (>25) | **38** | 2 | 40 | t3 | ✅ |  |
+| 24_bpmn_3 | t3 (>25) | **26** | 1 | 27 | t3 | ✅ |  |
+| 25_bpmn_3 | t3 (>25) | **27** | 1 | 28 | t3 | ✅ | rebalanced: added HR-input notify send task + interrupting timer boundary + HR timeout end event |
+
+*Assigned per tier: t1=5 t2=5 t3=5 (target 5/5/5). By new entity count: t1=5 t2=5 t3=5.* ✅
+
+## IT
+
+| file | assigned tier | entities (new) | groups | entities+groups | tier by entities | in band? | note |
+|---|---|---|---|---|---|---|---|
+| 06_it_1 | t1 (<10) | **4** | 1 | 5 | t1 | ✅ |  |
+| 07_it_1 | t1 (<10) | **6** | 1 | 7 | t1 | ✅ |  |
+| 08_it_1 | t1 (<10) | **4** | 3 | 7 | t1 | ✅ |  |
+| 09_it_1 | t1 (<10) | **6** | 2 | 8 | t1 | ✅ |  |
+| 10_it_1 | t1 (<10) | **8** | 1 | 9 | t1 | ✅ |  |
+| 16_it_2 | t2 (10–25) | **11** | 1 | 12 | t2 | ✅ |  |
+| 17_it_2 | t2 (10–25) | **13** | 1 | 14 | t2 | ✅ |  |
+| 18_it_2 | t2 (10–25) | **11** | 2 | 13 | t2 | ✅ |  |
+| 19_it_2 | t2 (10–25) | **12** | 6 | 18 | t2 | ✅ |  |
+| 20_it_2 | t2 (10–25) | **12** | 2 | 14 | t2 | ✅ |  |
+| 26_it_3 | t3 (>25) | **28** | 10 | 38 | t3 | ✅ | rebalanced: added CDN edge, per-DC SIEMs in new monitoring zones, bidirectional SIEM replication |
+| 27_it_3 | t3 (>25) | **28** | 3 | 31 | t3 | ✅ |  |
+| 28_it_3 | t3 (>25) | **26** | 2 | 28 | t3 | ✅ | rebalanced: added BigQuery, Pub/Sub DLQ, Error Reporting |
+| 29_it_3 | t3 (>25) | **27** | 5 | 32 | t3 | ✅ | rebalanced: split PostHog into Analytics + Error Tracking sub-systems, completed staging mirror (redis + monitoring + log server + worker), added production background worker |
+| 30_it_3 | t3 (>25) | **26** | 3 | 29 | t3 | ✅ | rebalanced: added OTA service, schema registry, secret manager (Vault), IAM service, relational DB backup |
+
+*Assigned per tier: t1=5 t2=5 t3=5 (target 5/5/5). By new entity count: t1=5 t2=5 t3=5.* ✅
+
+## Out-of-band diagrams to adjust
+
+*All previously out-of-band diagrams have been rebalanced. No action required.*
+
+*Reference — if tier were defined on entities+groups ("structural size") instead of entities-only:*
+- BPMN by size: recompute after rebalance
+- IT by size: recompute after rebalance
diff --git a/data/01_bpmn_1.JSON b/data/01_bpmn_1.JSON
@@ -0,0 +1,75 @@
+{
+  "metadata": {
+    "id": "bpmn_1_01",
+    "source": "A.1.0.bpmn",
+    "diagram_type": "bpmn_process",
+    "tier": 1,
+    "entity_count": 5,
+    "container_count": 0,
+    "attachment_count": 0,
+    "description": "Simple sequential process: Start \u2192 Task 1 \u2192 Task 2 \u2192 Task 3 \u2192 End"
+  },
+  "nodes": [
+    {
+      "id": "task_1",
+      "name": "Task 1",
+      "type": "task",
+      "lane": null,
+      "attached_to": null
+    },
+    {
+      "id": "task_2",
+      "name": "Task 2",
+      "type": "task",
+      "lane": null,
+      "attached_to": null
+    },
+    {
+      "id": "task_3",
+      "name": "Task 3",
+      "type": "task",
+      "lane": null,
+      "attached_to": null
+    },
+    {
+      "id": "start_event",
+      "name": "Start Event",
+      "type": "startEvent",
+      "lane": null,
+      "attached_to": null
+    },
+    {
+      "id": "end_event",
+      "name": "End Event",
+      "type": "endEvent",
+      "lane": null,
+      "attached_to": null
+    }
+  ],
+  "sequence_flows": [
+    {
+      "id": "sf_1",
+      "name": "",
+      "source": "start_event",
+      "target": "task_1"
+    },
+    {
+      "id": "sf_2",
+      "name": "",
+      "source": "task_1",
+      "target": "task_2"
+    },
+    {
+      "id": "sf_3",
+      "name": "",
+      "source": "task_2",
+      "target": "task_3"
+    },
+    {
+      "id": "sf_4",
+      "name": "",
+      "source": "task_3",
+      "target": "end_event"
+    }
+  ]
+}
diff --git a/data/01_bpmn_1_ground_truth.MMD b/data/01_bpmn_1_ground_truth.MMD
@@ -0,0 +1,15 @@
+---
+config:
+  theme: default
+---
+flowchart LR
+    task_1["Task 1"]
+    task_2["Task 2"]
+    task_3["Task 3"]
+    start_event(["Start Event"])
+    end_event(["End Event"])
+
+    start_event --> task_1
+    task_1 --> task_2
+    task_2 --> task_3
+    task_3 --> end_event
diff --git a/data/02_bpmn_1.JSON b/data/02_bpmn_1.JSON
@@ -0,0 +1,126 @@
+{
+  "metadata": {
+    "id": "bpmn_1_02",
+    "source": "A.2.0.bpmn",
+    "diagram_type": "bpmn_process",
+    "tier": 1,
+    "entity_count": 8,
+    "container_count": 0,
+    "attachment_count": 0,
+    "description": "Process with exclusive gateway splitting into three paths: Task 2 goes directly to End Event, Tasks 3 and 4 merge before End Event"
+  },
+  "nodes": [
+    {
+      "id": "start_event",
+      "name": "Start Event",
+      "type": "startEvent",
+      "lane": null,
+      "attached_to": null
+    },
+    {
+      "id": "task_1",
+      "name": "Task 1",
+      "type": "task",
+      "lane": null,
+      "attached_to": null
+    },
+    {
+      "id": "gw_split",
+      "name": "Gateway\n(Split Flow)",
+      "type": "exclusiveGateway",
+      "lane": null,
+      "attached_to": null
+    },
+    {
+      "id": "task_2",
+      "name": "Task 2",
+      "type": "task",
+      "lane": null,
+      "attached_to": null
+    },
+    {
+      "id": "task_3",
+      "name": "Task 3",
+      "type": "task",
+      "lane": null,
+      "attached_to": null
+    },
+    {
+      "id": "task_4",
+      "name": "Task 4",
+      "type": "task",
+      "lane": null,
+      "attached_to": null
+    },
+    {
+      "id": "gw_merge",
+      "name": "Gateway\n(Merge Flows)",
+      "type": "exclusiveGateway",
+      "lane": null,
+      "attached_to": null
+    },
+    {
+      "id": "end_event",
+      "name": "End Event",
+      "type": "endEvent",
+      "lane": null,
+      "attached_to": null
+    }
+  ],
+  "sequence_flows": [
+    {
+      "id": "sf_1",
+      "name": "",
+      "source": "start_event",
+      "target": "task_1"
+    },
+    {
+      "id": "sf_2",
+      "name": "",
+      "source": "task_1",
+      "target": "gw_split"
+    },
+    {
+      "id": "sf_3",
+      "name": "",
+      "source": "gw_split",
+      "target": "task_2"
+    },
+    {
+      "id": "sf_4",
+      "name": "",
+      "source": "gw_split",
+      "target": "task_3"
+    },
+    {
+      "id": "sf_5",
+      "name": "",
+      "source": "gw_split",
+      "target": "task_4"
+    },
+    {
+      "id": "sf_6",
+      "name": "",
+      "source": "task_2",
+      "target": "end_event"
+    },
+    {
+      "id": "sf_7",
+      "name": "",
+      "source": "task_3",
+      "target": "gw_merge"
+    },
+    {
+      "id": "sf_8",
+      "name": "",
+      "source": "task_4",
+      "target": "gw_merge"
+    },
+    {
+      "id": "sf_9",
+      "name": "",
+      "source": "gw_merge",
+      "target": "end_event"
+    }
+  ]
+}
diff --git a/data/02_bpmn_1_ground_truth.MMD b/data/02_bpmn_1_ground_truth.MMD
@@ -0,0 +1,23 @@
+---
+config:
+  theme: default
+---
+flowchart LR
+    start_event(["Start Event"])
+    task_1["Task 1"]
+    gw_split{"Gateway\n(Split Flow)"}
+    task_2["Task 2"]
+    task_3["Task 3"]
+    task_4["Task 4"]
+    gw_merge{"Gateway\n(Merge Flows)"}
+    end_event(["End Event"])
+
+    start_event --> task_1
+    task_1 --> gw_split
+    gw_split --> task_2
+    gw_split --> task_3
+    gw_split --> task_4
+    task_2 --> end_event
+    task_3 --> gw_merge
+    task_4 --> gw_merge
+    gw_merge --> end_event