Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions .claude/SESSION_RECAP_2026-05-06.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Session Recap — 2026-05-06

## Goal
Add Mistral and Gemini providers to the experiment matrix. Strategies (CrewAI, LangGraph) deferred to next session.

## What happened, in order

1. **Branch:** Created `feat/expand-strategies-and-providers` off `main`.
2. **Mistral + Gemini providers added:**
- `src/maestro/providers/mistral.py` (using `mistralai` SDK — note v2.x reorganized imports to `mistralai.client.Mistral`)
- `src/maestro/providers/gemini.py` (using `google-genai` SDK)
- Wired into `providers/__init__.py`, `experiment_config.py:MODELS`, `run.py:_create_provider`
- `.env.template` updated with `MISTRAL_API_KEY` + `GEMINI_API_KEY`
- `pyproject.toml` deps: `mistralai>=1.5`, `google-genai>=1.0`
- `coderabbit.yaml` got naming-convention rules for `*Provider` / `*Strategy` suffixes
- Models: `mistral-small-2603` ($0.15/$0.60), `gemini-2.5-flash-lite` ($0.10/$0.40)
3. **Bug surfaced:** Gemini SOP runs failed with `json.loads()` error on step 1. Root cause: provider `SYSTEM_PROMPT` hardcoded to "Mermaid only" leaks into SOP intermediate steps that ask for JSON. Smaller models (Gemini) follow the system prompt strictly; larger models tolerated the mismatch.
4. **Bug fix split off:** Stashed feature work, branched `fix/sop-system-prompt-decoupling` off `main`. Added optional `system_prompt: str | None = None` parameter to `LLMProvider.complete()` and all concrete providers. SOP strategy now passes a JSON-extraction system prompt for steps 1 and 2; step 3 keeps the provider default. Smoke-tested OpenAI + Anthropic, no regressions.
5. **Fix PR (#10) opened, reviewed, merged.** CodeRabbit suggested adding a regression test — declined for that PR because (a) repo has no test infrastructure yet, (b) the proposed assertion was mechanically wrong (it suggested asserting on the user-prompt content, but `system_prompt` is a separate kwarg). Tracked as a follow-up chore.
6. **Repo hygiene PR (#11):** Added `.github/ISSUE_TEMPLATE/` (bug, feature, chore + config.yml). Merged.
7. **GitHub project structure set up:**
- Milestones: `experimental-artefact` (due 2026-06-01) and `analysis` (due 2026-07-10)
- Labels: GitHub defaults + custom `chore`
- Test-setup follow-up issue created and assigned to `experimental-artefact`
8. **Feature branch resumed:** Merged `main` into `feat/expand-strategies-and-providers` (clean, no conflicts), stash popped. Patched Mistral + Gemini providers to accept the new `system_prompt` parameter. Smoke-tested both — 4/4 success including the previously-failing Gemini SOP case.
9. **Feature commit landed locally** as `feat: add Mistral and Gemini providers`. **NOT yet pushed or merged** — user closed session before pushing.

## Current state

**Branch:** `feat/expand-strategies-and-providers` — has one local commit ahead of remote, not yet pushed.

**Working tree:** clean apart from gitignored `*.db` files (intentional — user will delete these before final experiment runs).

**Open PRs:** none.

**Pending issues on GitHub:** test-setup chore (`experimental-artefact` milestone).

## Next session — pick up here

1. **Push the feature branch:**
```
git push -u origin feat/expand-strategies-and-providers
```

2. **Open the feature PR** with the body drafted in the prior session (saved in conversation history). Title: `feat: add Mistral and Gemini providers`. Mark Ready for review.

3. **CodeRabbit review:** wait, triage, reply or fix. If rate-limited (1 review/hour), self-review and merge anyway since the PR was smoke-tested live.

4. **After merge:** delete the local branch, sync `main`, then start the strategies work.

5. **Strategies branch (next):** new branch `feat/add-crewai-and-langgraph-strategies` off `main`. User decided to bundle both CrewAI + LangGraph in a single PR rather than split (reasonable for related work; revisit if either turns out to be multi-day surprise complexity).

## Key decisions / lessons captured to memory

- User runs git commands themselves — give commands as text, don't execute via Bash.
- PR granularity: don't push splitting unless there's a concrete benefit (time-to-merge, dependency ordering, reviewer cognitive load).
- Branch naming convention: `feat/<desc>`, `fix/<desc>`, `chore/<desc>` matching Conventional Commits.
- Bugs found mid-feature → separate `fix/` branch off `main`, merge first, then absorb back into the feature branch.

## Open questions / deferred work

- Test infrastructure: zero tests exist yet. Tracked as a chore on GitHub. Set up `tests/` + `conftest.py` + first regression test before final experiment runs.
- `gemini-2.5-flash-lite` is a stable alias, not a dated snapshot. Swap to a dated form before the final experiment runs for thesis-grade reproducibility.
- `*.db` not in `.gitignore` — currently 7 SQLite files show as untracked. User plans to delete them manually before the final experiment, but adding them to `.gitignore` would be cleaner.
- Frontier model addition for the experiment — user wants one frontier model + one cheap model per provider family to compare model depth, but only cheap models are currently registered.
56 changes: 56 additions & 0 deletions .claude/maestro_tier_rebalance_worksheet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# MAESTRO — Tier Re-balancing Worksheet (entity counts under Phase 0 contract)
*Generated 2026-06-11. Updated 2026-06-12 after rebalance.*
*entity = inline-drawn node; group = subgraph (pool/lane/boundary/expanded sub-process).*
*Goal: each category needs 5 diagrams in each tier. Bands (fixed): t1 `<10`, t2 `10–25`, t3 `>25` entities.*

## BPMN

| file | assigned tier | entities (new) | groups | entities+groups | tier by entities | in band? | note |
|---|---|---|---|---|---|---|---|
| 01_bpmn_1 | t1 (<10) | **5** | 0 | 5 | t1 | ✅ | |
| 02_bpmn_1 | t1 (<10) | **8** | 0 | 8 | t1 | ✅ | |
| 03_bpmn_1 | t1 (<10) | **8** | 0 | 8 | t1 | ✅ | |
| 04_bpmn_1 | t1 (<10) | **9** | 0 | 9 | t1 | ✅ | rebalanced: dropped task_2, rewired sub_process via boundary events only |
| 05_bpmn_1 | t1 (<10) | **9** | 0 | 9 | t1 | ✅ | rebalanced: dropped employee-not-found branch, notify tasks, update tasks; merged approved ends |
| 11_bpmn_2 | t2 (10–25) | **15** | 6 | 21 | t2 | ✅ | |
| 12_bpmn_2 | t2 (10–25) | **21** | 5 | 26 | t2 | ✅ | |
| 13_bpmn_2 | t2 (10–25) | **23** | 3 | 26 | t2 | ✅ | rebalanced: dropped IT/Payroll/Facilities pools + 3 icatch events + parallel split/merge in Responsible Dept |
| 14_bpmn_2 | t2 (10–25) | **24** | 4 | 28 | t2 | ✅ | rebalanced: dropped connected-clients subprocess + call activity, replaced corporate rework with B2B referral, dropped document_risk + degenerate merge gateways |
| 15_bpmn_2 | t2 (10–25) | **23** | 3 | 26 | t2 | ✅ | |
| 21_bpmn_3 | t3 (>25) | **29** | 5 | 34 | t3 | ✅ | |
| 22_bpmn_3 | t3 (>25) | **30** | 6 | 36 | t3 | ✅ | |
| 23_bpmn_3 | t3 (>25) | **38** | 2 | 40 | t3 | ✅ | |
| 24_bpmn_3 | t3 (>25) | **26** | 1 | 27 | t3 | ✅ | |
| 25_bpmn_3 | t3 (>25) | **27** | 1 | 28 | t3 | ✅ | rebalanced: added HR-input notify send task + interrupting timer boundary + HR timeout end event |

*Assigned per tier: t1=5 t2=5 t3=5 (target 5/5/5). By new entity count: t1=5 t2=5 t3=5.* ✅

## IT

| file | assigned tier | entities (new) | groups | entities+groups | tier by entities | in band? | note |
|---|---|---|---|---|---|---|---|
| 06_it_1 | t1 (<10) | **4** | 1 | 5 | t1 | ✅ | |
| 07_it_1 | t1 (<10) | **6** | 1 | 7 | t1 | ✅ | |
| 08_it_1 | t1 (<10) | **4** | 3 | 7 | t1 | ✅ | |
| 09_it_1 | t1 (<10) | **6** | 2 | 8 | t1 | ✅ | |
| 10_it_1 | t1 (<10) | **8** | 1 | 9 | t1 | ✅ | |
| 16_it_2 | t2 (10–25) | **11** | 1 | 12 | t2 | ✅ | |
| 17_it_2 | t2 (10–25) | **13** | 1 | 14 | t2 | ✅ | |
| 18_it_2 | t2 (10–25) | **11** | 2 | 13 | t2 | ✅ | |
| 19_it_2 | t2 (10–25) | **12** | 6 | 18 | t2 | ✅ | |
| 20_it_2 | t2 (10–25) | **12** | 2 | 14 | t2 | ✅ | |
| 26_it_3 | t3 (>25) | **28** | 10 | 38 | t3 | ✅ | rebalanced: added CDN edge, per-DC SIEMs in new monitoring zones, bidirectional SIEM replication |
| 27_it_3 | t3 (>25) | **28** | 3 | 31 | t3 | ✅ | |
| 28_it_3 | t3 (>25) | **26** | 2 | 28 | t3 | ✅ | rebalanced: added BigQuery, Pub/Sub DLQ, Error Reporting |
| 29_it_3 | t3 (>25) | **27** | 5 | 32 | t3 | ✅ | rebalanced: split PostHog into Analytics + Error Tracking sub-systems, completed staging mirror (redis + monitoring + log server + worker), added production background worker |
| 30_it_3 | t3 (>25) | **26** | 3 | 29 | t3 | ✅ | rebalanced: added OTA service, schema registry, secret manager (Vault), IAM service, relational DB backup |

*Assigned per tier: t1=5 t2=5 t3=5 (target 5/5/5). By new entity count: t1=5 t2=5 t3=5.* ✅

## Out-of-band diagrams to adjust

*All previously out-of-band diagrams have been rebalanced. No action required.*

*Reference — if tier were defined on entities+groups ("structural size") instead of entities-only:*
- BPMN by size: recompute after rebalance
- IT by size: recompute after rebalance
75 changes: 75 additions & 0 deletions data/01_bpmn_1.JSON
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
{
"metadata": {
"id": "bpmn_1_01",
"source": "A.1.0.bpmn",
"diagram_type": "bpmn_process",
"tier": 1,
"entity_count": 5,
"container_count": 0,
"attachment_count": 0,
"description": "Simple sequential process: Start \u2192 Task 1 \u2192 Task 2 \u2192 Task 3 \u2192 End"
},
"nodes": [
{
"id": "task_1",
"name": "Task 1",
"type": "task",
"lane": null,
"attached_to": null
},
{
"id": "task_2",
"name": "Task 2",
"type": "task",
"lane": null,
"attached_to": null
},
{
"id": "task_3",
"name": "Task 3",
"type": "task",
"lane": null,
"attached_to": null
},
{
"id": "start_event",
"name": "Start Event",
"type": "startEvent",
"lane": null,
"attached_to": null
},
{
"id": "end_event",
"name": "End Event",
"type": "endEvent",
"lane": null,
"attached_to": null
}
],
"sequence_flows": [
{
"id": "sf_1",
"name": "",
"source": "start_event",
"target": "task_1"
},
{
"id": "sf_2",
"name": "",
"source": "task_1",
"target": "task_2"
},
{
"id": "sf_3",
"name": "",
"source": "task_2",
"target": "task_3"
},
{
"id": "sf_4",
"name": "",
"source": "task_3",
"target": "end_event"
}
]
}
15 changes: 15 additions & 0 deletions data/01_bpmn_1_ground_truth.MMD
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
config:
theme: default
---
flowchart LR
task_1["Task 1"]
task_2["Task 2"]
task_3["Task 3"]
start_event(["Start Event"])
end_event(["End Event"])

start_event --> task_1
task_1 --> task_2
task_2 --> task_3
task_3 --> end_event
126 changes: 126 additions & 0 deletions data/02_bpmn_1.JSON
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
{
"metadata": {
"id": "bpmn_1_02",
"source": "A.2.0.bpmn",
"diagram_type": "bpmn_process",
"tier": 1,
"entity_count": 8,
"container_count": 0,
"attachment_count": 0,
"description": "Process with exclusive gateway splitting into three paths: Task 2 goes directly to End Event, Tasks 3 and 4 merge before End Event"
},
"nodes": [
{
"id": "start_event",
"name": "Start Event",
"type": "startEvent",
"lane": null,
"attached_to": null
},
{
"id": "task_1",
"name": "Task 1",
"type": "task",
"lane": null,
"attached_to": null
},
{
"id": "gw_split",
"name": "Gateway\n(Split Flow)",
"type": "exclusiveGateway",
"lane": null,
"attached_to": null
},
{
"id": "task_2",
"name": "Task 2",
"type": "task",
"lane": null,
"attached_to": null
},
{
"id": "task_3",
"name": "Task 3",
"type": "task",
"lane": null,
"attached_to": null
},
{
"id": "task_4",
"name": "Task 4",
"type": "task",
"lane": null,
"attached_to": null
},
{
"id": "gw_merge",
"name": "Gateway\n(Merge Flows)",
"type": "exclusiveGateway",
"lane": null,
"attached_to": null
},
{
"id": "end_event",
"name": "End Event",
"type": "endEvent",
"lane": null,
"attached_to": null
}
],
"sequence_flows": [
{
"id": "sf_1",
"name": "",
"source": "start_event",
"target": "task_1"
},
{
"id": "sf_2",
"name": "",
"source": "task_1",
"target": "gw_split"
},
{
"id": "sf_3",
"name": "",
"source": "gw_split",
"target": "task_2"
},
{
"id": "sf_4",
"name": "",
"source": "gw_split",
"target": "task_3"
},
{
"id": "sf_5",
"name": "",
"source": "gw_split",
"target": "task_4"
},
{
"id": "sf_6",
"name": "",
"source": "task_2",
"target": "end_event"
},
{
"id": "sf_7",
"name": "",
"source": "task_3",
"target": "gw_merge"
},
{
"id": "sf_8",
"name": "",
"source": "task_4",
"target": "gw_merge"
},
{
"id": "sf_9",
"name": "",
"source": "gw_merge",
"target": "end_event"
}
]
}
23 changes: 23 additions & 0 deletions data/02_bpmn_1_ground_truth.MMD
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
config:
theme: default
---
flowchart LR
start_event(["Start Event"])
task_1["Task 1"]
gw_split{"Gateway\n(Split Flow)"}
task_2["Task 2"]
task_3["Task 3"]
task_4["Task 4"]
gw_merge{"Gateway\n(Merge Flows)"}
end_event(["End Event"])

start_event --> task_1
task_1 --> gw_split
gw_split --> task_2
gw_split --> task_3
gw_split --> task_4
task_2 --> end_event
task_3 --> gw_merge
task_4 --> gw_merge
gw_merge --> end_event
Loading