diff --git a/.claude/SESSION_RECAP_2026-05-06.md b/.claude/SESSION_RECAP_2026-05-06.md new file mode 100644 index 0000000..df2ef52 --- /dev/null +++ b/.claude/SESSION_RECAP_2026-05-06.md @@ -0,0 +1,65 @@ +# Session Recap — 2026-05-06 + +## Goal +Add Mistral and Gemini providers to the experiment matrix. Strategies (CrewAI, LangGraph) deferred to next session. + +## What happened, in order + +1. **Branch:** Created `feat/expand-strategies-and-providers` off `main`. +2. **Mistral + Gemini providers added:** + - `src/maestro/providers/mistral.py` (using `mistralai` SDK — note v2.x reorganized imports to `mistralai.client.Mistral`) + - `src/maestro/providers/gemini.py` (using `google-genai` SDK) + - Wired into `providers/__init__.py`, `experiment_config.py:MODELS`, `run.py:_create_provider` + - `.env.template` updated with `MISTRAL_API_KEY` + `GEMINI_API_KEY` + - `pyproject.toml` deps: `mistralai>=1.5`, `google-genai>=1.0` + - `coderabbit.yaml` got naming-convention rules for `*Provider` / `*Strategy` suffixes + - Models: `mistral-small-2603` ($0.15/$0.60), `gemini-2.5-flash-lite` ($0.10/$0.40) +3. **Bug surfaced:** Gemini SOP runs failed with `json.loads()` error on step 1. Root cause: provider `SYSTEM_PROMPT` hardcoded to "Mermaid only" leaks into SOP intermediate steps that ask for JSON. Smaller models (Gemini) follow the system prompt strictly; larger models tolerated the mismatch. +4. **Bug fix split off:** Stashed feature work, branched `fix/sop-system-prompt-decoupling` off `main`. Added optional `system_prompt: str | None = None` parameter to `LLMProvider.complete()` and all concrete providers. SOP strategy now passes a JSON-extraction system prompt for steps 1 and 2; step 3 keeps the provider default. Smoke-tested OpenAI + Anthropic, no regressions. +5. **Fix PR (#10) opened, reviewed, merged.** CodeRabbit suggested adding a regression test — declined for that PR because (a) repo has no test infrastructure yet, (b) the proposed assertion was mechanically wrong (it suggested asserting on the user-prompt content, but `system_prompt` is a separate kwarg). Tracked as a follow-up chore. +6. **Repo hygiene PR (#11):** Added `.github/ISSUE_TEMPLATE/` (bug, feature, chore + config.yml). Merged. +7. **GitHub project structure set up:** + - Milestones: `experimental-artefact` (due 2026-06-01) and `analysis` (due 2026-07-10) + - Labels: GitHub defaults + custom `chore` + - Test-setup follow-up issue created and assigned to `experimental-artefact` +8. **Feature branch resumed:** Merged `main` into `feat/expand-strategies-and-providers` (clean, no conflicts), stash popped. Patched Mistral + Gemini providers to accept the new `system_prompt` parameter. Smoke-tested both — 4/4 success including the previously-failing Gemini SOP case. +9. **Feature commit landed locally** as `feat: add Mistral and Gemini providers`. **NOT yet pushed or merged** — user closed session before pushing. + +## Current state + +**Branch:** `feat/expand-strategies-and-providers` — has one local commit ahead of remote, not yet pushed. + +**Working tree:** clean apart from gitignored `*.db` files (intentional — user will delete these before final experiment runs). + +**Open PRs:** none. + +**Pending issues on GitHub:** test-setup chore (`experimental-artefact` milestone). + +## Next session — pick up here + +1. **Push the feature branch:** + ``` + git push -u origin feat/expand-strategies-and-providers + ``` + +2. **Open the feature PR** with the body drafted in the prior session (saved in conversation history). Title: `feat: add Mistral and Gemini providers`. Mark Ready for review. + +3. **CodeRabbit review:** wait, triage, reply or fix. If rate-limited (1 review/hour), self-review and merge anyway since the PR was smoke-tested live. + +4. **After merge:** delete the local branch, sync `main`, then start the strategies work. + +5. **Strategies branch (next):** new branch `feat/add-crewai-and-langgraph-strategies` off `main`. User decided to bundle both CrewAI + LangGraph in a single PR rather than split (reasonable for related work; revisit if either turns out to be multi-day surprise complexity). + +## Key decisions / lessons captured to memory + +- User runs git commands themselves — give commands as text, don't execute via Bash. +- PR granularity: don't push splitting unless there's a concrete benefit (time-to-merge, dependency ordering, reviewer cognitive load). +- Branch naming convention: `feat/`, `fix/`, `chore/` matching Conventional Commits. +- Bugs found mid-feature → separate `fix/` branch off `main`, merge first, then absorb back into the feature branch. + +## Open questions / deferred work + +- Test infrastructure: zero tests exist yet. Tracked as a chore on GitHub. Set up `tests/` + `conftest.py` + first regression test before final experiment runs. +- `gemini-2.5-flash-lite` is a stable alias, not a dated snapshot. Swap to a dated form before the final experiment runs for thesis-grade reproducibility. +- `*.db` not in `.gitignore` — currently 7 SQLite files show as untracked. User plans to delete them manually before the final experiment, but adding them to `.gitignore` would be cleaner. +- Frontier model addition for the experiment — user wants one frontier model + one cheap model per provider family to compare model depth, but only cheap models are currently registered. \ No newline at end of file diff --git a/.claude/maestro_tier_rebalance_worksheet.md b/.claude/maestro_tier_rebalance_worksheet.md new file mode 100644 index 0000000..ece19aa --- /dev/null +++ b/.claude/maestro_tier_rebalance_worksheet.md @@ -0,0 +1,56 @@ +# MAESTRO — Tier Re-balancing Worksheet (entity counts under Phase 0 contract) +*Generated 2026-06-11. Updated 2026-06-12 after rebalance.* +*entity = inline-drawn node; group = subgraph (pool/lane/boundary/expanded sub-process).* +*Goal: each category needs 5 diagrams in each tier. Bands (fixed): t1 `<10`, t2 `10–25`, t3 `>25` entities.* + +## BPMN + +| file | assigned tier | entities (new) | groups | entities+groups | tier by entities | in band? | note | +|---|---|---|---|---|---|---|---| +| 01_bpmn_1 | t1 (<10) | **5** | 0 | 5 | t1 | ✅ | | +| 02_bpmn_1 | t1 (<10) | **8** | 0 | 8 | t1 | ✅ | | +| 03_bpmn_1 | t1 (<10) | **8** | 0 | 8 | t1 | ✅ | | +| 04_bpmn_1 | t1 (<10) | **9** | 0 | 9 | t1 | ✅ | rebalanced: dropped task_2, rewired sub_process via boundary events only | +| 05_bpmn_1 | t1 (<10) | **9** | 0 | 9 | t1 | ✅ | rebalanced: dropped employee-not-found branch, notify tasks, update tasks; merged approved ends | +| 11_bpmn_2 | t2 (10–25) | **15** | 6 | 21 | t2 | ✅ | | +| 12_bpmn_2 | t2 (10–25) | **21** | 5 | 26 | t2 | ✅ | | +| 13_bpmn_2 | t2 (10–25) | **23** | 3 | 26 | t2 | ✅ | rebalanced: dropped IT/Payroll/Facilities pools + 3 icatch events + parallel split/merge in Responsible Dept | +| 14_bpmn_2 | t2 (10–25) | **24** | 4 | 28 | t2 | ✅ | rebalanced: dropped connected-clients subprocess + call activity, replaced corporate rework with B2B referral, dropped document_risk + degenerate merge gateways | +| 15_bpmn_2 | t2 (10–25) | **23** | 3 | 26 | t2 | ✅ | | +| 21_bpmn_3 | t3 (>25) | **29** | 5 | 34 | t3 | ✅ | | +| 22_bpmn_3 | t3 (>25) | **30** | 6 | 36 | t3 | ✅ | | +| 23_bpmn_3 | t3 (>25) | **38** | 2 | 40 | t3 | ✅ | | +| 24_bpmn_3 | t3 (>25) | **26** | 1 | 27 | t3 | ✅ | | +| 25_bpmn_3 | t3 (>25) | **27** | 1 | 28 | t3 | ✅ | rebalanced: added HR-input notify send task + interrupting timer boundary + HR timeout end event | + +*Assigned per tier: t1=5 t2=5 t3=5 (target 5/5/5). By new entity count: t1=5 t2=5 t3=5.* ✅ + +## IT + +| file | assigned tier | entities (new) | groups | entities+groups | tier by entities | in band? | note | +|---|---|---|---|---|---|---|---| +| 06_it_1 | t1 (<10) | **4** | 1 | 5 | t1 | ✅ | | +| 07_it_1 | t1 (<10) | **6** | 1 | 7 | t1 | ✅ | | +| 08_it_1 | t1 (<10) | **4** | 3 | 7 | t1 | ✅ | | +| 09_it_1 | t1 (<10) | **6** | 2 | 8 | t1 | ✅ | | +| 10_it_1 | t1 (<10) | **8** | 1 | 9 | t1 | ✅ | | +| 16_it_2 | t2 (10–25) | **11** | 1 | 12 | t2 | ✅ | | +| 17_it_2 | t2 (10–25) | **13** | 1 | 14 | t2 | ✅ | | +| 18_it_2 | t2 (10–25) | **11** | 2 | 13 | t2 | ✅ | | +| 19_it_2 | t2 (10–25) | **12** | 6 | 18 | t2 | ✅ | | +| 20_it_2 | t2 (10–25) | **12** | 2 | 14 | t2 | ✅ | | +| 26_it_3 | t3 (>25) | **28** | 10 | 38 | t3 | ✅ | rebalanced: added CDN edge, per-DC SIEMs in new monitoring zones, bidirectional SIEM replication | +| 27_it_3 | t3 (>25) | **28** | 3 | 31 | t3 | ✅ | | +| 28_it_3 | t3 (>25) | **26** | 2 | 28 | t3 | ✅ | rebalanced: added BigQuery, Pub/Sub DLQ, Error Reporting | +| 29_it_3 | t3 (>25) | **27** | 5 | 32 | t3 | ✅ | rebalanced: split PostHog into Analytics + Error Tracking sub-systems, completed staging mirror (redis + monitoring + log server + worker), added production background worker | +| 30_it_3 | t3 (>25) | **26** | 3 | 29 | t3 | ✅ | rebalanced: added OTA service, schema registry, secret manager (Vault), IAM service, relational DB backup | + +*Assigned per tier: t1=5 t2=5 t3=5 (target 5/5/5). By new entity count: t1=5 t2=5 t3=5.* ✅ + +## Out-of-band diagrams to adjust + +*All previously out-of-band diagrams have been rebalanced. No action required.* + +*Reference — if tier were defined on entities+groups ("structural size") instead of entities-only:* +- BPMN by size: recompute after rebalance +- IT by size: recompute after rebalance \ No newline at end of file diff --git a/data/01_bpmn_1.JSON b/data/01_bpmn_1.JSON new file mode 100644 index 0000000..ba430ba --- /dev/null +++ b/data/01_bpmn_1.JSON @@ -0,0 +1,75 @@ +{ + "metadata": { + "id": "bpmn_1_01", + "source": "A.1.0.bpmn", + "diagram_type": "bpmn_process", + "tier": 1, + "entity_count": 5, + "container_count": 0, + "attachment_count": 0, + "description": "Simple sequential process: Start \u2192 Task 1 \u2192 Task 2 \u2192 Task 3 \u2192 End" + }, + "nodes": [ + { + "id": "task_1", + "name": "Task 1", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "task_2", + "name": "Task 2", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "task_3", + "name": "Task 3", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "start_event", + "name": "Start Event", + "type": "startEvent", + "lane": null, + "attached_to": null + }, + { + "id": "end_event", + "name": "End Event", + "type": "endEvent", + "lane": null, + "attached_to": null + } + ], + "sequence_flows": [ + { + "id": "sf_1", + "name": "", + "source": "start_event", + "target": "task_1" + }, + { + "id": "sf_2", + "name": "", + "source": "task_1", + "target": "task_2" + }, + { + "id": "sf_3", + "name": "", + "source": "task_2", + "target": "task_3" + }, + { + "id": "sf_4", + "name": "", + "source": "task_3", + "target": "end_event" + } + ] +} \ No newline at end of file diff --git a/data/01_bpmn_1_ground_truth.MMD b/data/01_bpmn_1_ground_truth.MMD new file mode 100644 index 0000000..1f4eef1 --- /dev/null +++ b/data/01_bpmn_1_ground_truth.MMD @@ -0,0 +1,15 @@ +--- +config: + theme: default +--- +flowchart LR + task_1["Task 1"] + task_2["Task 2"] + task_3["Task 3"] + start_event(["Start Event"]) + end_event(["End Event"]) + + start_event --> task_1 + task_1 --> task_2 + task_2 --> task_3 + task_3 --> end_event \ No newline at end of file diff --git a/data/02_bpmn_1.JSON b/data/02_bpmn_1.JSON new file mode 100644 index 0000000..63e66b6 --- /dev/null +++ b/data/02_bpmn_1.JSON @@ -0,0 +1,126 @@ +{ + "metadata": { + "id": "bpmn_1_02", + "source": "A.2.0.bpmn", + "diagram_type": "bpmn_process", + "tier": 1, + "entity_count": 8, + "container_count": 0, + "attachment_count": 0, + "description": "Process with exclusive gateway splitting into three paths: Task 2 goes directly to End Event, Tasks 3 and 4 merge before End Event" + }, + "nodes": [ + { + "id": "start_event", + "name": "Start Event", + "type": "startEvent", + "lane": null, + "attached_to": null + }, + { + "id": "task_1", + "name": "Task 1", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "gw_split", + "name": "Gateway\n(Split Flow)", + "type": "exclusiveGateway", + "lane": null, + "attached_to": null + }, + { + "id": "task_2", + "name": "Task 2", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "task_3", + "name": "Task 3", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "task_4", + "name": "Task 4", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "gw_merge", + "name": "Gateway\n(Merge Flows)", + "type": "exclusiveGateway", + "lane": null, + "attached_to": null + }, + { + "id": "end_event", + "name": "End Event", + "type": "endEvent", + "lane": null, + "attached_to": null + } + ], + "sequence_flows": [ + { + "id": "sf_1", + "name": "", + "source": "start_event", + "target": "task_1" + }, + { + "id": "sf_2", + "name": "", + "source": "task_1", + "target": "gw_split" + }, + { + "id": "sf_3", + "name": "", + "source": "gw_split", + "target": "task_2" + }, + { + "id": "sf_4", + "name": "", + "source": "gw_split", + "target": "task_3" + }, + { + "id": "sf_5", + "name": "", + "source": "gw_split", + "target": "task_4" + }, + { + "id": "sf_6", + "name": "", + "source": "task_2", + "target": "end_event" + }, + { + "id": "sf_7", + "name": "", + "source": "task_3", + "target": "gw_merge" + }, + { + "id": "sf_8", + "name": "", + "source": "task_4", + "target": "gw_merge" + }, + { + "id": "sf_9", + "name": "", + "source": "gw_merge", + "target": "end_event" + } + ] +} \ No newline at end of file diff --git a/data/02_bpmn_1_ground_truth.MMD b/data/02_bpmn_1_ground_truth.MMD new file mode 100644 index 0000000..32d3541 --- /dev/null +++ b/data/02_bpmn_1_ground_truth.MMD @@ -0,0 +1,23 @@ +--- +config: + theme: default +--- +flowchart LR + start_event(["Start Event"]) + task_1["Task 1"] + gw_split{"Gateway\n(Split Flow)"} + task_2["Task 2"] + task_3["Task 3"] + task_4["Task 4"] + gw_merge{"Gateway\n(Merge Flows)"} + end_event(["End Event"]) + + start_event --> task_1 + task_1 --> gw_split + gw_split --> task_2 + gw_split --> task_3 + gw_split --> task_4 + task_2 --> end_event + task_3 --> gw_merge + task_4 --> gw_merge + gw_merge --> end_event diff --git a/data/03_bpmn_1.JSON b/data/03_bpmn_1.JSON new file mode 100644 index 0000000..7bae237 --- /dev/null +++ b/data/03_bpmn_1.JSON @@ -0,0 +1,138 @@ +{ + "metadata": { + "id": "bpmn_1_03", + "source": "A.2.1.bpmn", + "diagram_type": "bpmn_process", + "tier": 1, + "entity_count": 8, + "container_count": 0, + "attachment_count": 0, + "description": "Extended split-flow process: exclusive gateway with default/conditional paths; Task 2 and Task 4 have additional default flows into Task 3 as fallback merge target" + }, + "nodes": [ + { + "id": "start_event", + "name": "Start Event", + "type": "startEvent", + "lane": null, + "attached_to": null + }, + { + "id": "task_1", + "name": "Task 1", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "gw_split", + "name": "Gateway\n(Split Flow)", + "type": "exclusiveGateway", + "lane": null, + "attached_to": null + }, + { + "id": "task_2", + "name": "Task 2", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "task_3", + "name": "Task 3", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "task_4", + "name": "Task 4", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "gw_merge", + "name": "Gateway\n(Merge Flows)", + "type": "exclusiveGateway", + "lane": null, + "attached_to": null + }, + { + "id": "end_event", + "name": "End Event", + "type": "endEvent", + "lane": null, + "attached_to": null + } + ], + "sequence_flows": [ + { + "id": "sf_1", + "name": "", + "source": "start_event", + "target": "task_1" + }, + { + "id": "sf_2", + "name": "", + "source": "task_1", + "target": "gw_split" + }, + { + "id": "sf_3", + "name": "Default", + "source": "gw_split", + "target": "task_2" + }, + { + "id": "sf_4", + "name": "", + "source": "gw_split", + "target": "task_3" + }, + { + "id": "sf_5", + "name": "", + "source": "gw_split", + "target": "task_4" + }, + { + "id": "sf_6", + "name": "Condition", + "source": "task_2", + "target": "end_event" + }, + { + "id": "sf_7", + "name": "", + "source": "task_2", + "target": "task_3" + }, + { + "id": "sf_8", + "name": "", + "source": "task_3", + "target": "gw_merge" + }, + { + "id": "sf_9", + "name": "condition", + "source": "task_4", + "target": "gw_merge" + }, + { + "id": "sf_10", + "name": "", + "source": "task_4", + "target": "task_3" + }, + { + "id": "sf_11", + "name": "", + "source": "gw_merge", + "target": "end_event" + } + ] +} \ No newline at end of file diff --git a/data/03_bpmn_1_ground_truth.MMD b/data/03_bpmn_1_ground_truth.MMD new file mode 100644 index 0000000..2abc4ff --- /dev/null +++ b/data/03_bpmn_1_ground_truth.MMD @@ -0,0 +1,25 @@ +--- +config: + theme: default +--- +flowchart LR + start_event(["Start Event"]) + task_1["Task 1"] + gw_split{"Gateway\n(Split Flow)"} + task_2["Task 2"] + task_3["Task 3"] + task_4["Task 4"] + gw_merge{"Gateway\n(Merge Flows)"} + end_event(["End Event"]) + + start_event --> task_1 + task_1 --> gw_split + gw_split -->|"Default"| task_2 + gw_split --> task_3 + gw_split --> task_4 + task_2 -->|"Condition"| end_event + task_2 --> task_3 + task_3 --> gw_merge + task_4 -->|"condition"| gw_merge + task_4 --> task_3 + gw_merge --> end_event diff --git a/data/04_bpmn_1.JSON b/data/04_bpmn_1.JSON new file mode 100644 index 0000000..0048e71 --- /dev/null +++ b/data/04_bpmn_1.JSON @@ -0,0 +1,115 @@ +{ + "metadata": { + "id": "bpmn_1_04", + "source": "A.3.0.bpmn", + "diagram_type": "bpmn_process", + "tier": 1, + "entity_count": 9, + "container_count": 0, + "attachment_count": 2, + "description": "Process with collapsed sub-process having two boundary events: non-interrupting message boundary triggers Task 2, interrupting escalation boundary triggers Task 3 with separate End Event" + }, + "nodes": [ + { + "id": "start_event", + "name": "Start Event", + "type": "startEvent", + "lane": null, + "attached_to": null + }, + { + "id": "task_1", + "name": "Task 1", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "sub_process", + "name": "Collapsed\nSub-Process", + "type": "subProcess", + "lane": null, + "attached_to": null + }, + { + "id": "bnd_message", + "name": "Boundary Intermediate Event Non-Interrupting Message", + "type": "boundaryEvent", + "lane": null, + "attached_to": "sub_process" + }, + { + "id": "bnd_escalation", + "name": "Boundary Intermediate Event Interrupting Escalation", + "type": "boundaryEvent", + "lane": null, + "attached_to": "sub_process" + }, + { + "id": "task_2", + "name": "Task 2", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "task_3", + "name": "Task 3", + "type": "task", + "lane": null, + "attached_to": null + }, + { + "id": "end_event_1", + "name": "End Event 1", + "type": "endEvent", + "lane": null, + "attached_to": null + }, + { + "id": "end_event_2", + "name": "End Event 2", + "type": "endEvent", + "lane": null, + "attached_to": null + } + ], + "sequence_flows": [ + { + "id": "sf_1", + "name": "", + "source": "start_event", + "target": "task_1" + }, + { + "id": "sf_2", + "name": "", + "source": "task_1", + "target": "sub_process" + }, + { + "id": "sf_3", + "name": "", + "source": "bnd_message", + "target": "task_2" + }, + { + "id": "sf_4", + "name": "", + "source": "bnd_escalation", + "target": "task_3" + }, + { + "id": "sf_5", + "name": "", + "source": "task_2", + "target": "end_event_1" + }, + { + "id": "sf_6", + "name": "", + "source": "task_3", + "target": "end_event_2" + } + ] +} \ No newline at end of file diff --git a/data/04_bpmn_1_ground_truth.MMD b/data/04_bpmn_1_ground_truth.MMD new file mode 100644 index 0000000..53a3d34 --- /dev/null +++ b/data/04_bpmn_1_ground_truth.MMD @@ -0,0 +1,21 @@ +--- +config: + theme: default +--- +flowchart LR + start_event(["Start Event"]) + task_1["Task 1"] + sub_process[["Collapsed\nSub-Process"]] + task_2["Task 2"] + task_3["Task 3"] + end_event_1(["End Event 1"]) + end_event_2(["End Event 2"]) + + start_event --> task_1 + task_1 --> sub_process + task_2 --> end_event_1 + task_3 --> end_event_2 + sub_process o--o bnd_message(("Boundary Intermediate Event Non-Interrupting Message")) + bnd_message --> task_2 + sub_process o--o bnd_escalation(("Boundary Intermediate Event Interrupting Escalation")) + bnd_escalation --> task_3 \ No newline at end of file diff --git a/data/05_bpmn_1.JSON b/data/05_bpmn_1.JSON new file mode 100644 index 0000000..a5a76d7 --- /dev/null +++ b/data/05_bpmn_1.JSON @@ -0,0 +1,133 @@ +{ + "metadata": { + "id": "bpmn_1_05", + "source": "C.8.0.bpmn", + "diagram_type": "bpmn_process", + "tier": 1, + "entity_count": 9, + "container_count": 0, + "attachment_count": 0, + "description": "Vacation request process: DMN-based automated approval with three outcomes (refused auto, approved auto, manual validation); manager approval path" + }, + "nodes": [ + { + "id": "start_vacation_request", + "name": "Vacation Request Received", + "type": "startEvent", + "lane": null, + "attached_to": null + }, + { + "id": "task_get_vacation_status", + "name": "Get Current Vacation Status", + "type": "serviceTask", + "lane": null, + "attached_to": null + }, + { + "id": "task_vacation_approval", + "name": "Vacation Approval", + "type": "businessRuleTask", + "lane": null, + "attached_to": null + }, + { + "id": "gw_result", + "name": "", + "type": "exclusiveGateway", + "lane": null, + "attached_to": null + }, + { + "id": "end_refused_auto", + "name": "Vacation Refused Automatically", + "type": "endEvent", + "lane": null, + "attached_to": null + }, + { + "id": "end_approved", + "name": "Vacation Approved", + "type": "endEvent", + "lane": null, + "attached_to": null + }, + { + "id": "task_manual_approve", + "name": "Manually Approve Vacation", + "type": "userTask", + "lane": null, + "attached_to": null + }, + { + "id": "gw_manager_decision", + "name": "", + "type": "exclusiveGateway", + "lane": null, + "attached_to": null + }, + { + "id": "end_refused_manager", + "name": "Vacation Refused by Manager", + "type": "endEvent", + "lane": null, + "attached_to": null + } + ], + "sequence_flows": [ + { + "id": "sf_1", + "name": "", + "source": "start_vacation_request", + "target": "task_get_vacation_status" + }, + { + "id": "sf_2", + "name": "", + "source": "task_get_vacation_status", + "target": "task_vacation_approval" + }, + { + "id": "sf_3", + "name": "", + "source": "task_vacation_approval", + "target": "gw_result" + }, + { + "id": "sf_4", + "name": "Refused", + "source": "gw_result", + "target": "end_refused_auto" + }, + { + "id": "sf_5", + "name": "Approved", + "source": "gw_result", + "target": "end_approved" + }, + { + "id": "sf_6", + "name": "Manual Validation Required", + "source": "gw_result", + "target": "task_manual_approve" + }, + { + "id": "sf_7", + "name": "", + "source": "task_manual_approve", + "target": "gw_manager_decision" + }, + { + "id": "sf_8", + "name": "Approved", + "source": "gw_manager_decision", + "target": "end_approved" + }, + { + "id": "sf_9", + "name": "Refused", + "source": "gw_manager_decision", + "target": "end_refused_manager" + } + ] +} \ No newline at end of file diff --git a/data/05_bpmn_1_ground_truth.MMD b/data/05_bpmn_1_ground_truth.MMD new file mode 100644 index 0000000..5c08cfc --- /dev/null +++ b/data/05_bpmn_1_ground_truth.MMD @@ -0,0 +1,24 @@ +--- +config: + theme: default +--- +flowchart LR + start_vacation_request(["Vacation Request Received"]) + task_get_vacation_status["Get Current Vacation Status"] + task_vacation_approval["Vacation Approval"] + gw_result{"Result"} + end_refused_auto(["Vacation Refused Automatically"]) + end_approved(["Vacation Approved"]) + task_manual_approve["Manually Approve Vacation"] + gw_manager_decision{"Manager Decision"} + end_refused_manager(["Vacation Refused by Manager"]) + + start_vacation_request --> task_get_vacation_status + task_get_vacation_status --> task_vacation_approval + task_vacation_approval --> gw_result + gw_result -->|"Refused"| end_refused_auto + gw_result -->|"Approved"| end_approved + gw_result -->|"Manual Validation Required"| task_manual_approve + task_manual_approve --> gw_manager_decision + gw_manager_decision -->|"Approved"| end_approved + gw_manager_decision -->|"Refused"| end_refused_manager \ No newline at end of file diff --git a/data/06_it_1.JSON b/data/06_it_1.JSON new file mode 100644 index 0000000..9fb10ce --- /dev/null +++ b/data/06_it_1.JSON @@ -0,0 +1,70 @@ +{ + "metadata": { + "id": "it_1_06", + "diagram_type": "c4_container", + "tier": 1, + "entity_count": 4, + "container_count": 1, + "attachment_count": 0, + "description": "C4 Container diagram of SomeApp: a public-facing web application hosted on Infomaniak Public Cloud, served via Cloudflare CDN with S3-compatible object storage" + }, + "system_boundary": { + "id": "infomaniak", + "name": "Infomaniak Public Cloud", + "type": "deployment_environment" + }, + "elements": [ + { + "id": "user", + "name": "User", + "type": "person", + "description": "End user accessing the web application via browser" + }, + { + "id": "cloudflare", + "name": "Cloudflare", + "type": "external_system", + "description": "CDN and edge proxy. Terminates HTTPS at the edge (www.someapp.xx) and routes traffic to the origin server" + }, + { + "id": "web_app", + "name": "SomeApp", + "type": "container", + "technology": "Web Application", + "boundary": "infomaniak", + "description": "Public-facing web application. Receives proxied requests from Cloudflare and serves the SomeApp product" + }, + { + "id": "object_storage", + "name": "Object Storage", + "type": "container", + "technology": "S3-compatible Bucket", + "boundary": "infomaniak", + "description": "S3-compatible object storage bucket hosted on Infomaniak. Stores application assets and data" + } + ], + "relationships": [ + { + "id": "r1", + "source": "user", + "target": "cloudflare", + "label": "Accesses via HTTPS", + "technology": "HTTPS", + "note": "www.someapp.xx" + }, + { + "id": "r2", + "source": "cloudflare", + "target": "web_app", + "label": "Reverse proxy / origin request", + "technology": "HTTPS" + }, + { + "id": "r3", + "source": "web_app", + "target": "object_storage", + "label": "Reads and writes files", + "technology": "S3 API" + } + ] +} diff --git a/data/06_it_1_ground_truth.MMD b/data/06_it_1_ground_truth.MMD new file mode 100644 index 0000000..768a328 --- /dev/null +++ b/data/06_it_1_ground_truth.MMD @@ -0,0 +1,17 @@ +--- +config: + theme: default +--- +flowchart LR + user["User\n[Person]"] + cloudflare["Cloudflare\n[External System]\nCDN / Edge Proxy"] + + subgraph infomaniak["Infomaniak Public Cloud"] + direction LR + web_app["SomeApp\n[Container]\nWeb Application"] + object_storage[("Object Storage\n[Container]\nS3-compatible Bucket")] + end + + user -->|"HTTPS · www.someapp.xx"| cloudflare + cloudflare -->|"Reverse proxy"| web_app + web_app -->|"Read / write · S3 API"| object_storage diff --git a/data/07_it_1.JSON b/data/07_it_1.JSON new file mode 100644 index 0000000..52cad0d --- /dev/null +++ b/data/07_it_1.JSON @@ -0,0 +1,98 @@ +{ + "metadata": { + "id": "it_1_07", + "diagram_type": "c4_container", + "tier": 1, + "entity_count": 6, + "container_count": 1, + "attachment_count": 0, + "description": "C4 Container diagram of SomeApp: extends it_1_06 with a PostgreSQL database and developer SSH access to the application server" + }, + "system_boundary": { + "id": "infomaniak", + "name": "Infomaniak Public Cloud", + "type": "deployment_environment" + }, + "elements": [ + { + "id": "user", + "name": "User", + "type": "person", + "description": "End user accessing the web application via browser" + }, + { + "id": "developer", + "name": "Developer", + "type": "person", + "description": "Developer or operator managing the application server via SSH" + }, + { + "id": "cloudflare", + "name": "Cloudflare", + "type": "external_system", + "description": "CDN and edge proxy. Terminates HTTPS at the edge (www.someapp.xx) and routes traffic to the origin server" + }, + { + "id": "web_app", + "name": "SomeApp", + "type": "container", + "technology": "Web Application", + "boundary": "infomaniak", + "description": "Public-facing web application. Receives proxied requests from Cloudflare and reads/writes to database and object storage" + }, + { + "id": "postgres", + "name": "PostgreSQL", + "type": "container", + "technology": "Relational Database", + "boundary": "infomaniak", + "description": "Primary relational database storing application data" + }, + { + "id": "object_storage", + "name": "Object Storage", + "type": "container", + "technology": "S3-compatible Bucket", + "boundary": "infomaniak", + "description": "S3-compatible object storage bucket for application assets and files" + } + ], + "relationships": [ + { + "id": "r1", + "source": "user", + "target": "cloudflare", + "label": "Accesses via HTTPS", + "technology": "HTTPS", + "note": "www.someapp.xx" + }, + { + "id": "r2", + "source": "cloudflare", + "target": "web_app", + "label": "Reverse proxy / origin request", + "technology": "HTTPS" + }, + { + "id": "r3", + "source": "web_app", + "target": "postgres", + "label": "Reads and writes application data", + "technology": "PostgreSQL" + }, + { + "id": "r4", + "source": "web_app", + "target": "object_storage", + "label": "Reads and writes files", + "technology": "S3 API" + }, + { + "id": "r5", + "source": "developer", + "target": "web_app", + "label": "Manages application server", + "technology": "SSH" + } + ] +} diff --git a/data/07_it_1_ground_truth.MMD b/data/07_it_1_ground_truth.MMD new file mode 100644 index 0000000..041f3b4 --- /dev/null +++ b/data/07_it_1_ground_truth.MMD @@ -0,0 +1,21 @@ +--- +config: + theme: default +--- +flowchart LR + user["User\n[Person]"] + developer["Developer\n[Person]"] + cloudflare["Cloudflare\n[External System]\nCDN / Edge Proxy"] + + subgraph infomaniak["Infomaniak Public Cloud"] + direction LR + web_app["SomeApp\n[Container]\nWeb Application"] + postgres[("PostgreSQL\n[Container]\nRelational Database")] + object_storage[("Object Storage\n[Container]\nS3-compatible Bucket")] + end + + user -->|"HTTPS · www.someapp.xx"| cloudflare + cloudflare -->|"Reverse proxy"| web_app + web_app -->|"Read / write · PostgreSQL"| postgres + web_app -->|"Read / write · S3 API"| object_storage + developer -->|"SSH"| web_app diff --git a/data/08_it_1.JSON b/data/08_it_1.JSON new file mode 100644 index 0000000..985b43c --- /dev/null +++ b/data/08_it_1.JSON @@ -0,0 +1,88 @@ +{ + "metadata": { + "id": "it_1_08", + "diagram_type": "c4_container", + "tier": 1, + "entity_count": 4, + "container_count": 3, + "attachment_count": 0, + "description": "C4 Container diagram of a Google Apps Script web app restricted to a Google Workspace OU. The app (Code.gs + Index.html) executes as the deployer and reads/writes data.json stored on the deployer's Google Drive via the Drive API" + }, + "system_boundary": { + "id": "google_workspace", + "name": "Google Workspace", + "type": "organizational_unit" + }, + "elements": [ + { + "id": "user", + "name": "User", + "type": "person", + "description": "Workspace OU member accessing the web app via browser" + }, + { + "id": "apps_script_app", + "name": "Apps Script Web App", + "type": "container", + "technology": "Google Apps Script", + "boundary": "google_workspace", + "description": "Web app deployed via Google Apps Script. OU-restricted. Executes as the deployer identity" + }, + { + "id": "index_html", + "name": "Index.html", + "type": "component", + "technology": "HTML / JavaScript", + "boundary": "apps_script_app", + "description": "Frontend served to the user. Calls Code.gs for data operations" + }, + { + "id": "code_gs", + "name": "Code.gs", + "type": "component", + "technology": "Google Apps Script (GAS)", + "boundary": "apps_script_app", + "description": "Backend script handling Drive API calls to read and write data.json" + }, + { + "id": "google_drive", + "name": "Google Drive", + "type": "container", + "technology": "Google Drive", + "boundary": "google_workspace", + "description": "Deployer's Google Drive hosting the data.json file" + }, + { + "id": "data_json", + "name": "data.json", + "type": "container", + "technology": "Data Store", + "boundary": "google_drive", + "description": "JSON file on the deployer's Google Drive. Read and written by Code.gs via the Drive API" + } + ], + "relationships": [ + { + "id": "r1", + "source": "user", + "target": "index_html", + "label": "Accesses via browser", + "technology": "HTTPS", + "note": "OU-restricted" + }, + { + "id": "r2", + "source": "index_html", + "target": "code_gs", + "label": "Calls backend", + "technology": "GAS" + }, + { + "id": "r3", + "source": "code_gs", + "target": "data_json", + "label": "Reads and writes", + "technology": "Drive API" + } + ] +} diff --git a/data/08_it_1_ground_truth.MMD b/data/08_it_1_ground_truth.MMD new file mode 100644 index 0000000..4ac4e4c --- /dev/null +++ b/data/08_it_1_ground_truth.MMD @@ -0,0 +1,22 @@ +--- +config: + theme: default +--- +flowchart TD + user["User\n[Person]"] + + subgraph google_workspace["Google Workspace"] + direction LR + subgraph apps_script_app["Apps Script\n[Container]"] + direction LR + index_html["Index.html\n[Component]"] + code_gs["Code.gs\n[Component]"] + end + subgraph google_drive["G-Drive\n[Container]"] + data_json[("data.json\n[Data Store]")] + end + end + + user -->|"HTTPS · OU-restricted"| index_html + index_html -->|"Calls backend"| code_gs + code_gs -->|"Read / write · Drive API"| data_json diff --git a/data/09_it_1.JSON b/data/09_it_1.JSON new file mode 100644 index 0000000..6384fdf --- /dev/null +++ b/data/09_it_1.JSON @@ -0,0 +1,109 @@ +{ + "metadata": { + "id": "it_1_09", + "diagram_type": "c4_container", + "tier": 1, + "entity_count": 6, + "container_count": 2, + "attachment_count": 0, + "description": "C4 Container diagram of a GCP-hosted data analysis stack. A user authenticates via Google IAP to reach a web app that acts as an MCP host. The web app queries Gemini as the AI model and calls an MCP server with read-only access to an orders view on a PostgreSQL database" + }, + "system_boundary": { + "id": "google_cloud", + "name": "Google Cloud", + "type": "cloud_environment" + }, + "elements": [ + { + "id": "user", + "name": "User", + "type": "person", + "description": "Analyst accessing the web app via browser" + }, + { + "id": "iap", + "name": "Google IAP", + "type": "container", + "technology": "Identity-Aware Proxy", + "boundary": "google_cloud", + "description": "Google Identity-Aware Proxy. Enforces authentication before forwarding requests to the web app" + }, + { + "id": "web_app", + "name": "Web App", + "type": "container", + "technology": "Web Application / MCP Host", + "boundary": "google_cloud", + "description": "GCP-hosted web application acting as the MCP host. Sends prompts to Gemini and dispatches MCP tool calls to the MCP server" + }, + { + "id": "gemini", + "name": "Gemini", + "type": "container", + "technology": "Vertex AI (Google-managed)", + "boundary": "google_cloud", + "description": "Gemini model served via Vertex AI within the GCP project. Processes user prompts and drives MCP tool selection" + }, + { + "id": "mcp_server", + "name": "MCP Server", + "type": "container", + "technology": "MCP Server", + "boundary": "google_cloud", + "description": "MCP server exposing the orders view as a read-only query tool" + }, + { + "id": "postgres", + "name": "PostgreSQL", + "type": "container", + "technology": "Relational Database", + "boundary": "google_cloud", + "description": "PostgreSQL database on GCP hosting the orders view" + }, + { + "id": "orders_view", + "name": "orders", + "type": "container", + "technology": "Data Store (DB View)", + "boundary": "postgres", + "description": "Read-only PostgreSQL view exposing order data. Accessed exclusively by the MCP server" + } + ], + "relationships": [ + { + "id": "r1", + "source": "user", + "target": "iap", + "label": "Authenticates", + "technology": "HTTPS" + }, + { + "id": "r2", + "source": "iap", + "target": "web_app", + "label": "Forwards authenticated request", + "technology": "HTTPS" + }, + { + "id": "r3", + "source": "web_app", + "target": "gemini", + "label": "Sends prompts / receives responses", + "technology": "Gemini API" + }, + { + "id": "r4", + "source": "web_app", + "target": "mcp_server", + "label": "Calls MCP tools", + "technology": "MCP" + }, + { + "id": "r5", + "source": "mcp_server", + "target": "orders_view", + "label": "Read-only query", + "technology": "SQL" + } + ] +} diff --git a/data/09_it_1_ground_truth.MMD b/data/09_it_1_ground_truth.MMD new file mode 100644 index 0000000..1dbf8bc --- /dev/null +++ b/data/09_it_1_ground_truth.MMD @@ -0,0 +1,23 @@ +--- +config: + theme: default +--- +flowchart LR + user["User\n[Person]"] + + subgraph google_cloud["Google Cloud"] + direction LR + iap["Google IAP\n[Container]"] + web_app["Web App\n[Container]\nMCP Host"] + gemini["Gemini\n[Container]\nVertex AI"] + mcp_server["MCP Server\n[Container]"] + subgraph postgres["PostgreSQL\n[Container]"] + orders_view[("orders\n[Data Store]")] + end + end + + user -->|"HTTPS"| iap + iap -->|"Forwards request"| web_app + web_app -->|"Prompts · Vertex AI API"| gemini + web_app -->|"MCP tool calls"| mcp_server + mcp_server -->|"Read-only · SQL"| orders_view diff --git a/data/10_it_1.JSON b/data/10_it_1.JSON new file mode 100644 index 0000000..d3b5210 --- /dev/null +++ b/data/10_it_1.JSON @@ -0,0 +1,126 @@ +{ + "metadata": { + "id": "it_1_10", + "diagram_type": "network_topology", + "tier": 1, + "entity_count": 8, + "container_count": 1, + "attachment_count": 0, + "description": "Small office network topology. Internet traffic enters via a router, passes through a firewall, and is distributed by a core switch. The switch connects wired devices (NAS, printer, VoIP phones) and an access point serving WiFi clients (laptops)" + }, + "system_boundary": { + "id": "office_lan", + "name": "Office LAN", + "type": "network_boundary" + }, + "elements": [ + { + "id": "router", + "name": "Router", + "type": "device", + "technology": "Router", + "boundary": "office_lan", + "description": "Edge router. WAN uplink to ISP, LAN downlink to firewall" + }, + { + "id": "firewall", + "name": "Firewall", + "type": "device", + "technology": "Firewall", + "boundary": "office_lan", + "description": "Network firewall enforcing traffic policies between WAN and LAN" + }, + { + "id": "switch", + "name": "Switch", + "type": "device", + "technology": "Network Switch", + "boundary": "office_lan", + "description": "Core office switch. Distributes traffic to all wired devices and the access point" + }, + { + "id": "access_point", + "name": "Access Point", + "type": "device", + "technology": "WiFi Access Point", + "boundary": "office_lan", + "description": "Wireless access point providing WiFi connectivity to laptops" + }, + { + "id": "nas", + "name": "NAS", + "type": "server", + "technology": "Network Attached Storage", + "boundary": "office_lan", + "description": "Shared file storage for the office, wired to the switch" + }, + { + "id": "printer", + "name": "Printer", + "type": "peripheral", + "technology": "Network Printer", + "boundary": "office_lan", + "description": "Shared office printer, wired to the switch" + }, + { + "id": "voip_phones", + "name": "VoIP Phones", + "type": "device", + "technology": "VoIP", + "boundary": "office_lan", + "description": "Wired VoIP phones connected to the switch" + }, + { + "id": "user_clients", + "name": "User Clients", + "type": "device", + "technology": "Laptops (WiFi)", + "boundary": "office_lan", + "description": "Employee laptops connecting wirelessly via the access point" + } + ], + "relationships": [ + { + "id": "r1", + "source": "router", + "target": "firewall", + "label": "WAN → LAN" + }, + { + "id": "r2", + "source": "firewall", + "target": "switch", + "label": "Filtered traffic" + }, + { + "id": "r3", + "source": "switch", + "target": "nas", + "label": "Wired" + }, + { + "id": "r4", + "source": "switch", + "target": "printer", + "label": "Wired" + }, + { + "id": "r5", + "source": "switch", + "target": "voip_phones", + "label": "Wired" + }, + { + "id": "r6", + "source": "switch", + "target": "access_point", + "label": "Wired uplink" + }, + { + "id": "r7", + "source": "access_point", + "target": "user_clients", + "label": "WiFi" + } + ] +} diff --git a/data/10_it_1_ground_truth.MMD b/data/10_it_1_ground_truth.MMD new file mode 100644 index 0000000..c08db1f --- /dev/null +++ b/data/10_it_1_ground_truth.MMD @@ -0,0 +1,24 @@ +--- +config: + theme: default +--- +flowchart LR + subgraph office_lan["Office LAN"] + direction LR + router["Router\n[Device]"] + firewall["Firewall\n[Device]"] + switch["Switch\n[Device]"] + access_point["Access Point\n[Device]"] + nas[("NAS\n[Server]")] + printer["Printer\n[Peripheral]"] + voip_phones["VoIP Phones\n[Device]"] + user_clients["User Clients\n[Laptops]"] + end + + router -->|"WAN → LAN"| firewall + firewall -->|"Filtered traffic"| switch + switch -->|"Wired"| nas + switch -->|"Wired"| printer + switch -->|"Wired"| voip_phones + switch -->|"Wired uplink"| access_point + access_point -->|"WiFi"| user_clients diff --git a/data/bpmn_collaboration_01.JSON b/data/11_bpmn_2.JSON similarity index 71% rename from data/bpmn_collaboration_01.JSON rename to data/11_bpmn_2.JSON index 07813b4..e2e7fed 100644 --- a/data/bpmn_collaboration_01.JSON +++ b/data/11_bpmn_2.JSON @@ -1,22 +1,29 @@ { "metadata": { - "id": "bpmn_collaboration_01", + "id": "bpmn_2_11", "source": "A_4_0-roundtrip.bpmn", "diagram_type": "bpmn_collaboration", "tier": 2, - "entity_count": 17, + "entity_count": 15, + "container_count": 6, + "attachment_count": 0, "description": "Two-pool BPMN collaboration with message flows, lanes, and expanded sub-processes" }, "participants": [ { "id": "pool_1", "name": "Pool", - "lanes": ["lane_pool1"] + "lanes": [ + "lane_pool1" + ] }, { "id": "pool_2", "name": "Pool 2", - "lanes": ["lane_1", "lane_2"] + "lanes": [ + "lane_1", + "lane_2" + ] } ], "lanes": [ @@ -158,19 +165,71 @@ } ], "sequence_flows": [ - { "id": "sf_1", "source": "start_event_1", "target": "task_1" }, - { "id": "sf_2", "source": "task_1", "target": "task_2" }, - { "id": "sf_3", "source": "task_2", "target": "end_event_1" }, - { "id": "sf_4", "source": "start_event_2", "target": "task_3" }, - { "id": "sf_5", "source": "task_3", "target": "subprocess_1" }, - { "id": "sf_6", "source": "task_3", "target": "subprocess_2" }, - { "id": "sf_7", "source": "start_event_3", "target": "task_4" }, - { "id": "sf_8", "source": "task_4", "target": "end_event_3" }, - { "id": "sf_9", "source": "subprocess_1", "target": "task_5" }, - { "id": "sf_10", "source": "task_5", "target": "end_event_2" }, - { "id": "sf_11", "source": "start_event_4", "target": "task_6" }, - { "id": "sf_12", "source": "task_6", "target": "end_event_4" }, - { "id": "sf_13", "source": "subprocess_2", "target": "end_event_5" } + { + "id": "sf_1", + "source": "start_event_1", + "target": "task_1" + }, + { + "id": "sf_2", + "source": "task_1", + "target": "task_2" + }, + { + "id": "sf_3", + "source": "task_2", + "target": "end_event_1" + }, + { + "id": "sf_4", + "source": "start_event_2", + "target": "task_3" + }, + { + "id": "sf_5", + "source": "task_3", + "target": "subprocess_1" + }, + { + "id": "sf_6", + "source": "task_3", + "target": "subprocess_2" + }, + { + "id": "sf_7", + "source": "start_event_3", + "target": "task_4" + }, + { + "id": "sf_8", + "source": "task_4", + "target": "end_event_3" + }, + { + "id": "sf_9", + "source": "subprocess_1", + "target": "task_5" + }, + { + "id": "sf_10", + "source": "task_5", + "target": "end_event_2" + }, + { + "id": "sf_11", + "source": "start_event_4", + "target": "task_6" + }, + { + "id": "sf_12", + "source": "task_6", + "target": "end_event_4" + }, + { + "id": "sf_13", + "source": "subprocess_2", + "target": "end_event_5" + } ], "message_flows": [ { diff --git a/data/bpmn_collaboration_01_ground_truth.MMD b/data/11_bpmn_2_ground_truth.MMD similarity index 95% rename from data/bpmn_collaboration_01_ground_truth.MMD rename to data/11_bpmn_2_ground_truth.MMD index 07cc5bc..2577afe 100644 --- a/data/bpmn_collaboration_01_ground_truth.MMD +++ b/data/11_bpmn_2_ground_truth.MMD @@ -1,3 +1,7 @@ +--- +config: + theme: default +--- flowchart LR subgraph pool_1["Pool"] direction LR @@ -46,4 +50,4 @@ flowchart LR task_6 --> end_event_4 subprocess_2 --> end_event_5 task_1 -.->|Message Flow 1| task_3 - task_5 -.->|Message Flow 2| task_2 \ No newline at end of file + task_5 -.->|Message Flow 2| task_2 diff --git a/data/12_bpmn_2.JSON b/data/12_bpmn_2.JSON new file mode 100644 index 0000000..fd94f4f --- /dev/null +++ b/data/12_bpmn_2.JSON @@ -0,0 +1,347 @@ +{ + "metadata": { + "id": "bpmn_2_12", + "source": "C.1.0.bpmn", + "diagram_type": "bpmn_collaboration", + "tier": 2, + "entity_count": 21, + "container_count": 5, + "attachment_count": 0, + "description": "Invoice receipt collaboration: Team-Assistant pool handles scan/archive/approval-assignment with event-based waiting; Process Engine pool (with Approver, Team Assistant, Accountant lanes) handles approval workflow and bank transfer" + }, + "participants": [ + { + "id": "pool_team_assistant", + "name": "Team-Assistant" + }, + { + "id": "Process_Engine_1", + "name": "Process Engine - Invoice Receipt" + } + ], + "lanes": [ + { + "id": "lane_ta", + "name": "", + "pool": "pool_team_assistant" + }, + { + "id": "Approver", + "name": "Approver", + "pool": "Process_Engine_1" + }, + { + "id": "teamAssistant", + "name": "Team Assistant", + "pool": "Process_Engine_1" + }, + { + "id": "Accountant", + "name": "Accountant", + "pool": "Process_Engine_1" + } + ], + "nodes": [ + { + "id": "start_ta", + "name": "Invoice received", + "type": "startEvent", + "lane": "lane_ta", + "attached_to": null + }, + { + "id": "task_scan_invoice", + "name": "Scan Invoice", + "type": "task", + "lane": "lane_ta", + "attached_to": null + }, + { + "id": "task_archive_original", + "name": "Archive original", + "type": "task", + "lane": "lane_ta", + "attached_to": null + }, + { + "id": "icatch_approver_assigned", + "name": "Approver to be assigned", + "type": "intermediateCatchEvent", + "lane": "lane_ta", + "attached_to": null + }, + { + "id": "task_assign_approver", + "name": "Assign approver", + "type": "task", + "lane": "lane_ta", + "attached_to": null + }, + { + "id": "gw_event_based", + "name": "", + "type": "eventBasedGateway", + "lane": "lane_ta", + "attached_to": null + }, + { + "id": "icatch_review_needed", + "name": "Invoice review needed", + "type": "intermediateCatchEvent", + "lane": "lane_ta", + "attached_to": null + }, + { + "id": "task_review_document", + "name": "Review and document result", + "type": "task", + "lane": "lane_ta", + "attached_to": null + }, + { + "id": "icatch_7_days", + "name": "7 days", + "type": "intermediateCatchEvent", + "lane": "lane_ta", + "attached_to": null + }, + { + "id": "end_ta_1", + "name": "End Process", + "type": "endEvent", + "lane": "lane_ta", + "attached_to": null + }, + { + "id": "end_ta_2", + "name": "End Process", + "type": "endEvent", + "lane": "lane_ta", + "attached_to": null + }, + { + "id": "StartEvent_1", + "name": "Invoice received", + "type": "startEvent", + "lane": "teamAssistant", + "attached_to": null + }, + { + "id": "assignApprover", + "name": "Assign Approver", + "type": "userTask", + "lane": "teamAssistant", + "attached_to": null + }, + { + "id": "reviewInvoice", + "name": "Review Invoice", + "type": "userTask", + "lane": "teamAssistant", + "attached_to": null + }, + { + "id": "reviewSuccessful_gw", + "name": "Review successful?", + "type": "exclusiveGateway", + "lane": "teamAssistant", + "attached_to": null + }, + { + "id": "invoiceNotProcessed", + "name": "Invoice not processed", + "type": "endEvent", + "lane": "teamAssistant", + "attached_to": null + }, + { + "id": "approveInvoice", + "name": "Approve Invoice", + "type": "userTask", + "lane": "Approver", + "attached_to": null + }, + { + "id": "invoice_approved", + "name": "Invoice approved?", + "type": "exclusiveGateway", + "lane": "Approver", + "attached_to": null + }, + { + "id": "prepareBankTransfer", + "name": "Prepare Bank Transfer", + "type": "userTask", + "lane": "Accountant", + "attached_to": null + }, + { + "id": "archiveInvoice", + "name": "Archive Invoice", + "type": "serviceTask", + "lane": "Accountant", + "attached_to": null + }, + { + "id": "invoiceProcessed", + "name": "Invoice processed", + "type": "endEvent", + "lane": "Accountant", + "attached_to": null + } + ], + "sequence_flows": [ + { + "id": "sf_1", + "name": "", + "source": "start_ta", + "target": "task_scan_invoice" + }, + { + "id": "sf_2", + "name": "", + "source": "task_scan_invoice", + "target": "task_archive_original" + }, + { + "id": "sf_3", + "name": "", + "source": "task_archive_original", + "target": "icatch_approver_assigned" + }, + { + "id": "sf_4", + "name": "", + "source": "icatch_approver_assigned", + "target": "task_assign_approver" + }, + { + "id": "sf_5", + "name": "", + "source": "task_assign_approver", + "target": "gw_event_based" + }, + { + "id": "sf_6", + "name": "", + "source": "gw_event_based", + "target": "icatch_7_days" + }, + { + "id": "sf_7", + "name": "", + "source": "gw_event_based", + "target": "icatch_review_needed" + }, + { + "id": "sf_8", + "name": "", + "source": "icatch_7_days", + "target": "end_ta_1" + }, + { + "id": "sf_9", + "name": "", + "source": "icatch_review_needed", + "target": "task_review_document" + }, + { + "id": "sf_10", + "name": "", + "source": "task_review_document", + "target": "end_ta_2" + }, + { + "id": "sf_11", + "name": "", + "source": "StartEvent_1", + "target": "assignApprover" + }, + { + "id": "sf_12", + "name": "", + "source": "assignApprover", + "target": "approveInvoice" + }, + { + "id": "sf_13", + "name": "", + "source": "approveInvoice", + "target": "invoice_approved" + }, + { + "id": "sf_14", + "name": "yes", + "source": "invoice_approved", + "target": "prepareBankTransfer" + }, + { + "id": "sf_15", + "name": "no", + "source": "invoice_approved", + "target": "reviewInvoice" + }, + { + "id": "sf_16", + "name": "", + "source": "reviewInvoice", + "target": "reviewSuccessful_gw" + }, + { + "id": "sf_17", + "name": "yes", + "source": "reviewSuccessful_gw", + "target": "approveInvoice" + }, + { + "id": "sf_18", + "name": "no", + "source": "reviewSuccessful_gw", + "target": "invoiceNotProcessed" + }, + { + "id": "sf_19", + "name": "", + "source": "prepareBankTransfer", + "target": "archiveInvoice" + }, + { + "id": "sf_20", + "name": "", + "source": "archiveInvoice", + "target": "invoiceProcessed" + } + ], + "message_flows": [ + { + "id": "mf_1", + "name": "", + "source": "task_scan_invoice", + "target": "StartEvent_1" + }, + { + "id": "mf_2", + "name": "", + "source": "task_assign_approver", + "target": "assignApprover" + }, + { + "id": "mf_3", + "name": "", + "source": "assignApprover", + "target": "icatch_approver_assigned" + }, + { + "id": "mf_4", + "name": "", + "source": "task_review_document", + "target": "reviewInvoice" + }, + { + "id": "mf_5", + "name": "", + "source": "reviewInvoice", + "target": "icatch_review_needed" + } + ] +} \ No newline at end of file diff --git a/data/12_bpmn_2_ground_truth.MMD b/data/12_bpmn_2_ground_truth.MMD new file mode 100644 index 0000000..378a88a --- /dev/null +++ b/data/12_bpmn_2_ground_truth.MMD @@ -0,0 +1,62 @@ +--- +config: + theme: default +--- +flowchart TB + subgraph pool_team_assistant["Team-Assistant"] + start_ta(["Invoice received"]) + task_scan_invoice["Scan Invoice"] + task_archive_original["Archive original"] + icatch_approver_assigned(("Approver to be assigned")) + task_assign_approver["Assign approver"] + gw_event_based{" "} + icatch_review_needed(("Invoice review needed")) + task_review_document["Review and document result"] + icatch_7_days(("7 days")) + end_ta_1(["End Process"]) + end_ta_2(["End Process"]) + end + subgraph Process_Engine_1["Process Engine - Invoice Receipt"] + subgraph teamAssistant["Team Assistant"] + StartEvent_1(["Invoice received"]) + assignApprover["Assign Approver"] + reviewInvoice["Review Invoice"] + reviewSuccessful_gw{"Review successful?"} + invoiceNotProcessed(["Invoice not processed"]) + end + subgraph Approver["Approver"] + approveInvoice["Approve Invoice"] + invoice_approved{"Invoice approved?"} + end + subgraph Accountant["Accountant"] + prepareBankTransfer["Prepare Bank Transfer"] + archiveInvoice["Archive Invoice"] + invoiceProcessed(["Invoice processed"]) + end + end + + start_ta --> task_scan_invoice + task_scan_invoice --> task_archive_original + task_archive_original --> icatch_approver_assigned + icatch_approver_assigned --> task_assign_approver + task_assign_approver --> gw_event_based + gw_event_based --> icatch_7_days + gw_event_based --> icatch_review_needed + icatch_7_days --> end_ta_1 + icatch_review_needed --> task_review_document + task_review_document --> end_ta_2 + StartEvent_1 --> assignApprover + assignApprover --> approveInvoice + approveInvoice --> invoice_approved + invoice_approved -->|"yes"| prepareBankTransfer + invoice_approved -->|"no"| reviewInvoice + reviewInvoice --> reviewSuccessful_gw + reviewSuccessful_gw -->|"yes"| approveInvoice + reviewSuccessful_gw -->|"no"| invoiceNotProcessed + prepareBankTransfer --> archiveInvoice + archiveInvoice --> invoiceProcessed + task_scan_invoice -.-> StartEvent_1 + task_assign_approver -.-> assignApprover + assignApprover -.-> icatch_approver_assigned + task_review_document -.-> reviewInvoice + reviewInvoice -.-> icatch_review_needed diff --git a/data/13_bpmn_2.JSON b/data/13_bpmn_2.JSON new file mode 100644 index 0000000..2f463b1 --- /dev/null +++ b/data/13_bpmn_2.JSON @@ -0,0 +1,275 @@ +{ + "metadata": { + "id": "bpmn_2_13", + "source": "C.4.0.bpmn", + "diagram_type": "bpmn_collaboration", + "tier": 2, + "entity_count": 18, + "container_count": 3, + "attachment_count": 0, + "description": "Employee onboarding within Money Bank: HR Department handles contract signing and parallel onboarding tasks; Responsible Department introduces the team, runs position training, and delivers the welcome package" + }, + "participants": [ + { + "id": "pool_money_bank", + "name": "Money Bank" + } + ], + "lanes": [ + { + "id": "lane_hr_dept", + "name": "HR Department", + "pool": "pool_money_bank" + }, + { + "id": "lane_responsible_dept", + "name": "Responsible Department", + "pool": "pool_money_bank" + } + ], + "nodes": [ + { + "id": "start_candidate_accepted", + "name": "Candidate accepted offer", + "type": "startEvent", + "lane": "lane_hr_dept", + "attached_to": null + }, + { + "id": "task_send_candidate_contract", + "name": "Send candidate Contract", + "type": "userTask", + "lane": "lane_hr_dept", + "attached_to": null + }, + { + "id": "gw_contract_terms", + "name": "Contract terms accepted?", + "type": "exclusiveGateway", + "lane": "lane_hr_dept", + "attached_to": null + }, + { + "id": "task_review_terms_of_contract", + "name": "Review terms of contract", + "type": "userTask", + "lane": "lane_hr_dept", + "attached_to": null + }, + { + "id": "task_get_signature_notify_dept", + "name": "Get signature on contract and notify responsible department", + "type": "userTask", + "lane": "lane_hr_dept", + "attached_to": null + }, + { + "id": "gw_parallel_split", + "name": "", + "type": "parallelGateway", + "lane": "lane_hr_dept", + "attached_to": null + }, + { + "id": "task_inform_company_policies", + "name": "Inform employee of company policies", + "type": "userTask", + "lane": "lane_hr_dept", + "attached_to": null + }, + { + "id": "task_intro_mission_vision_values", + "name": "Introduce employee to company Mission, Vision and Values", + "type": "userTask", + "lane": "lane_hr_dept", + "attached_to": null + }, + { + "id": "task_training_time_reports", + "name": "Perform training for time reports sick leave and holidays", + "type": "userTask", + "lane": "lane_hr_dept", + "attached_to": null + }, + { + "id": "task_register_medical_insurance", + "name": "Register for medical insurance", + "type": "userTask", + "lane": "lane_hr_dept", + "attached_to": null + }, + { + "id": "gw_parallel_merge", + "name": "", + "type": "parallelGateway", + "lane": "lane_hr_dept", + "attached_to": null + }, + { + "id": "task_request_preparations", + "name": "Request preparations for a new employee", + "type": "userTask", + "lane": "lane_responsible_dept", + "attached_to": null + }, + { + "id": "ithrow_new_employee_in_dept", + "name": "New employee in department X", + "type": "intermediateThrowEvent", + "lane": "lane_responsible_dept", + "attached_to": null + }, + { + "id": "task_introduce_team", + "name": "Introduce new employee to the team", + "type": "userTask", + "lane": "lane_responsible_dept", + "attached_to": null + }, + { + "id": "task_training_for_position", + "name": "Perform training for position", + "type": "userTask", + "lane": "lane_responsible_dept", + "attached_to": null + }, + { + "id": "task_compile_welcome_package", + "name": "Compile welcome package", + "type": "userTask", + "lane": "lane_responsible_dept", + "attached_to": null + }, + { + "id": "task_give_welcome_package", + "name": "Give employee welcome package", + "type": "userTask", + "lane": "lane_responsible_dept", + "attached_to": null + }, + { + "id": "end_event", + "name": "End Event", + "type": "endEvent", + "lane": "lane_responsible_dept", + "attached_to": null + } + ], + "sequence_flows": [ + { + "id": "sf_1", + "name": "", + "source": "start_candidate_accepted", + "target": "task_send_candidate_contract" + }, + { + "id": "sf_2", + "name": "", + "source": "task_send_candidate_contract", + "target": "gw_contract_terms" + }, + { + "id": "sf_3", + "name": "Yes", + "source": "gw_contract_terms", + "target": "task_get_signature_notify_dept" + }, + { + "id": "sf_4", + "name": "No", + "source": "gw_contract_terms", + "target": "task_review_terms_of_contract" + }, + { + "id": "sf_5", + "name": "", + "source": "task_review_terms_of_contract", + "target": "task_send_candidate_contract" + }, + { + "id": "sf_6", + "name": "", + "source": "task_get_signature_notify_dept", + "target": "gw_parallel_split" + }, + { + "id": "sf_7", + "name": "", + "source": "gw_parallel_split", + "target": "task_inform_company_policies" + }, + { + "id": "sf_8", + "name": "", + "source": "task_inform_company_policies", + "target": "task_intro_mission_vision_values" + }, + { + "id": "sf_9", + "name": "", + "source": "task_intro_mission_vision_values", + "target": "task_training_time_reports" + }, + { + "id": "sf_10", + "name": "", + "source": "task_training_time_reports", + "target": "task_register_medical_insurance" + }, + { + "id": "sf_11", + "name": "", + "source": "task_register_medical_insurance", + "target": "gw_parallel_merge" + }, + { + "id": "sf_12", + "name": "", + "source": "gw_parallel_split", + "target": "task_request_preparations" + }, + { + "id": "sf_13", + "name": "", + "source": "task_request_preparations", + "target": "ithrow_new_employee_in_dept" + }, + { + "id": "sf_14", + "name": "", + "source": "ithrow_new_employee_in_dept", + "target": "gw_parallel_merge" + }, + { + "id": "sf_15", + "name": "", + "source": "gw_parallel_merge", + "target": "task_introduce_team" + }, + { + "id": "sf_16", + "name": "", + "source": "task_introduce_team", + "target": "task_training_for_position" + }, + { + "id": "sf_17", + "name": "", + "source": "task_training_for_position", + "target": "task_compile_welcome_package" + }, + { + "id": "sf_18", + "name": "", + "source": "task_compile_welcome_package", + "target": "task_give_welcome_package" + }, + { + "id": "sf_19", + "name": "", + "source": "task_give_welcome_package", + "target": "end_event" + } + ], + "message_flows": [] +} \ No newline at end of file diff --git a/data/13_bpmn_2_ground_truth.MMD b/data/13_bpmn_2_ground_truth.MMD new file mode 100644 index 0000000..a17c107 --- /dev/null +++ b/data/13_bpmn_2_ground_truth.MMD @@ -0,0 +1,51 @@ +--- +config: + theme: default +--- +flowchart LR + subgraph pool_money_bank["Money Bank"] + direction LR + subgraph lane_hr_dept["HR Department"] + direction LR + start_candidate_accepted(["Candidate accepted offer"]) + task_send_candidate_contract["Send candidate Contract"] + gw_contract_terms{"Contract terms accepted?"} + task_review_terms_of_contract["Review terms of contract"] + task_get_signature_notify_dept["Get signature on contract and notify responsible department"] + gw_parallel_split{{"+"}} + task_inform_company_policies["Inform employee of company policies"] + task_intro_mission_vision_values["Introduce employee to company Mission, Vision and Values"] + task_training_time_reports["Perform training for time reports sick leave and holidays"] + task_register_medical_insurance["Register for medical insurance"] + gw_parallel_merge{{"+"}} + end + subgraph lane_responsible_dept["Responsible Department"] + direction LR + task_request_preparations["Request preparations for a new employee"] + ithrow_new_employee_in_dept(("New employee in department X")) + task_introduce_team["Introduce new employee to the team"] + task_training_for_position["Perform training for position"] + task_compile_welcome_package["Compile welcome package"] + task_give_welcome_package["Give employee welcome package"] + end_event(["End Event"]) + end + end + start_candidate_accepted --> task_send_candidate_contract + task_send_candidate_contract --> gw_contract_terms + gw_contract_terms -->|"Yes"| task_get_signature_notify_dept + gw_contract_terms -->|"No"| task_review_terms_of_contract + task_review_terms_of_contract --> task_send_candidate_contract + task_get_signature_notify_dept --> gw_parallel_split + gw_parallel_split --> task_inform_company_policies + task_inform_company_policies --> task_intro_mission_vision_values + task_intro_mission_vision_values --> task_training_time_reports + task_training_time_reports --> task_register_medical_insurance + task_register_medical_insurance --> gw_parallel_merge + gw_parallel_split --> task_request_preparations + task_request_preparations --> ithrow_new_employee_in_dept + ithrow_new_employee_in_dept --> gw_parallel_merge + gw_parallel_merge --> task_introduce_team + task_introduce_team --> task_training_for_position + task_training_for_position --> task_compile_welcome_package + task_compile_welcome_package --> task_give_welcome_package + task_give_welcome_package --> end_event \ No newline at end of file diff --git a/data/14_bpmn_2.JSON b/data/14_bpmn_2.JSON new file mode 100644 index 0000000..ffc4cf8 --- /dev/null +++ b/data/14_bpmn_2.JSON @@ -0,0 +1,363 @@ +{ + "metadata": { + "id": "bpmn_2_14", + "source": "C.5.0.bpmn", + "diagram_type": "bpmn_collaboration", + "tier": 2, + "entity_count": 24, + "container_count": 4, + "attachment_count": 0, + "description": "Bank KYC onboarding: private account manager handles identity verification, document checks, parallel personal data and KYC activities, risk assessment; legal entities are referred to B2B department; head of market lane handles approval decisions" + }, + "participants": [ + { + "id": "pool_bank", + "name": "Bank" + } + ], + "lanes": [ + { + "id": "lane_private", + "name": "Private Customer Account Manager", + "pool": "pool_bank" + }, + { + "id": "lane_corporate", + "name": "Corporate Account Manager", + "pool": "pool_bank" + }, + { + "id": "lane_head_market", + "name": "Head of Market Service", + "pool": "pool_bank" + } + ], + "nodes": [ + { + "id": "start_customer_interested", + "name": "Customer interested in Bank offer", + "type": "startEvent", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_interview_customer", + "name": "Interview customer", + "type": "task", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_prove_identity", + "name": "Prove/Provide identity", + "type": "task", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "gw_legal_or_individual", + "name": "Legal entity or individual?", + "type": "exclusiveGateway", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_obtain_documents", + "name": "Obtain supporting data and documents of the customer", + "type": "task", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_check_documents", + "name": "Check customer documents", + "type": "task", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "gw_data_complete", + "name": "Data complete?", + "type": "exclusiveGateway", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_complete_data", + "name": "Complete data and documents", + "type": "task", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_copy_sign_scan", + "name": "Copy, sign, and scan documents", + "type": "task", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_file_documents", + "name": "File documents in customer file", + "type": "task", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "gw_parallel_split", + "name": "", + "type": "parallelGateway", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_add_personal_data", + "name": "Add personal data", + "type": "task", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_kyc", + "name": "Perform know your customer (KYC) activities", + "type": "task", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "gw_parallel_merge", + "name": "", + "type": "parallelGateway", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_risk_assessment", + "name": "Perform risk assessment of the customer", + "type": "task", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "gw_subject_to_approval", + "name": "Subject to approval?", + "type": "exclusiveGateway", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_create_customer", + "name": "Create customer in the system", + "type": "task", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "end_customer_created", + "name": "Identity determined and new customer created", + "type": "endEvent", + "lane": "lane_private", + "attached_to": null + }, + { + "id": "task_refer_b2b", + "name": "Referral to B2B Department", + "type": "task", + "lane": "lane_corporate", + "attached_to": null + }, + { + "id": "end_referred_b2b", + "name": "Referred to B2B Department", + "type": "endEvent", + "lane": "lane_corporate", + "attached_to": null + }, + { + "id": "task_check_risk", + "name": "Check risk and decide about approval", + "type": "task", + "lane": "lane_head_market", + "attached_to": null + }, + { + "id": "gw_approval", + "name": "Approval?", + "type": "exclusiveGateway", + "lane": "lane_head_market", + "attached_to": null + }, + { + "id": "task_reject_customer", + "name": "Reject customer request", + "type": "task", + "lane": "lane_head_market", + "attached_to": null + }, + { + "id": "end_no_business", + "name": "No business relation created", + "type": "endEvent", + "lane": "lane_head_market", + "attached_to": null + } + ], + "sequence_flows": [ + { + "id": "sf_1", + "name": "", + "source": "start_customer_interested", + "target": "task_interview_customer" + }, + { + "id": "sf_2", + "name": "", + "source": "task_interview_customer", + "target": "task_prove_identity" + }, + { + "id": "sf_3", + "name": "", + "source": "task_prove_identity", + "target": "gw_legal_or_individual" + }, + { + "id": "sf_4", + "name": "Individual Person", + "source": "gw_legal_or_individual", + "target": "task_obtain_documents" + }, + { + "id": "sf_5", + "name": "Legal Entity", + "source": "gw_legal_or_individual", + "target": "task_refer_b2b" + }, + { + "id": "sf_6", + "name": "", + "source": "task_refer_b2b", + "target": "end_referred_b2b" + }, + { + "id": "sf_7", + "name": "", + "source": "task_obtain_documents", + "target": "task_check_documents" + }, + { + "id": "sf_8", + "name": "", + "source": "task_check_documents", + "target": "gw_data_complete" + }, + { + "id": "sf_9", + "name": "Yes", + "source": "gw_data_complete", + "target": "task_copy_sign_scan" + }, + { + "id": "sf_10", + "name": "No", + "source": "gw_data_complete", + "target": "task_complete_data" + }, + { + "id": "sf_11", + "name": "", + "source": "task_complete_data", + "target": "task_copy_sign_scan" + }, + { + "id": "sf_13", + "name": "", + "source": "task_copy_sign_scan", + "target": "task_file_documents" + }, + { + "id": "sf_14", + "name": "", + "source": "task_file_documents", + "target": "gw_parallel_split" + }, + { + "id": "sf_15", + "name": "", + "source": "gw_parallel_split", + "target": "task_add_personal_data" + }, + { + "id": "sf_16", + "name": "", + "source": "gw_parallel_split", + "target": "task_kyc" + }, + { + "id": "sf_17", + "name": "", + "source": "task_add_personal_data", + "target": "gw_parallel_merge" + }, + { + "id": "sf_18", + "name": "", + "source": "task_kyc", + "target": "gw_parallel_merge" + }, + { + "id": "sf_19", + "name": "", + "source": "gw_parallel_merge", + "target": "task_risk_assessment" + }, + { + "id": "sf_20", + "name": "", + "source": "task_risk_assessment", + "target": "gw_subject_to_approval" + }, + { + "id": "sf_21", + "name": "No", + "source": "gw_subject_to_approval", + "target": "task_create_customer" + }, + { + "id": "sf_22", + "name": "Yes", + "source": "gw_subject_to_approval", + "target": "task_check_risk" + }, + { + "id": "sf_23", + "name": "", + "source": "task_check_risk", + "target": "gw_approval" + }, + { + "id": "sf_24", + "name": "Yes", + "source": "gw_approval", + "target": "task_create_customer" + }, + { + "id": "sf_25", + "name": "No", + "source": "gw_approval", + "target": "task_reject_customer" + }, + { + "id": "sf_26", + "name": "", + "source": "task_reject_customer", + "target": "end_no_business" + }, + { + "id": "sf_27", + "name": "", + "source": "task_create_customer", + "target": "end_customer_created" + } + ] +} \ No newline at end of file diff --git a/data/14_bpmn_2_ground_truth.MMD b/data/14_bpmn_2_ground_truth.MMD new file mode 100644 index 0000000..6d7e13d --- /dev/null +++ b/data/14_bpmn_2_ground_truth.MMD @@ -0,0 +1,67 @@ +--- +config: + theme: default +--- +flowchart LR + subgraph pool_bank["Bank"] + direction LR + subgraph lane_private["Private Customer Account Manager"] + direction LR + start_customer_interested(["Customer interested in Bank offer"]) + task_interview_customer["Interview customer"] + task_prove_identity["Prove/Provide identity"] + gw_legal_or_individual{"Legal entity or individual?"} + task_obtain_documents["Obtain supporting data and documents of the customer"] + task_check_documents["Check customer documents"] + gw_data_complete{"Data complete?"} + task_complete_data["Complete data and documents"] + task_copy_sign_scan["Copy, sign, and scan documents"] + task_file_documents["File documents in customer file"] + gw_parallel_split{{"+"}} + task_add_personal_data["Add personal data"] + task_kyc["Perform know your customer (KYC) activities"] + gw_parallel_merge{{"+"}} + task_risk_assessment["Perform risk assessment of the customer"] + gw_subject_to_approval{"Subject to approval?"} + task_create_customer["Create customer in the system"] + end_customer_created(["Identity determined and new customer created"]) + end + subgraph lane_corporate["Corporate Account Manager"] + direction LR + task_refer_b2b["Referral to B2B Department"] + end_referred_b2b(["Referred to B2B Department"]) + end + subgraph lane_head_market["Head of Market Service"] + direction LR + task_check_risk["Check risk and decide about approval"] + gw_approval{"Approval?"} + task_reject_customer["Reject customer request"] + end_no_business(["No business relation created"]) + end + end + start_customer_interested --> task_interview_customer + task_interview_customer --> task_prove_identity + task_prove_identity --> gw_legal_or_individual + gw_legal_or_individual -->|"Individual Person"| task_obtain_documents + gw_legal_or_individual -->|"Legal Entity"| task_refer_b2b + task_refer_b2b --> end_referred_b2b + task_obtain_documents --> task_check_documents + task_check_documents --> gw_data_complete + gw_data_complete -->|"Yes"| task_copy_sign_scan + gw_data_complete -->|"No"| task_complete_data + task_complete_data --> task_copy_sign_scan + task_copy_sign_scan --> task_file_documents + task_file_documents --> gw_parallel_split + gw_parallel_split --> task_add_personal_data + gw_parallel_split --> task_kyc + task_add_personal_data --> gw_parallel_merge + task_kyc --> gw_parallel_merge + gw_parallel_merge --> task_risk_assessment + task_risk_assessment --> gw_subject_to_approval + gw_subject_to_approval -->|"No"| task_create_customer + gw_subject_to_approval -->|"Yes"| task_check_risk + task_check_risk --> gw_approval + gw_approval -->|"Yes"| task_create_customer + gw_approval -->|"No"| task_reject_customer + task_reject_customer --> end_no_business + task_create_customer --> end_customer_created \ No newline at end of file diff --git a/data/15_bpmn_2.JSON b/data/15_bpmn_2.JSON new file mode 100644 index 0000000..a3c608f --- /dev/null +++ b/data/15_bpmn_2.JSON @@ -0,0 +1,347 @@ +{ + "metadata": { + "id": "bpmn_2_15", + "source": "C.9.0.bpmn", + "diagram_type": "bpmn_collaboration", + "tier": 2, + "entity_count": 23, + "container_count": 3, + "attachment_count": 1, + "description": "Customer onboarding process: credit score check, automated risk assessment (green/yellow/red), manual check call activity, fraud detection boundary event, plus event subprocesses for timeout and cancellation" + }, + "participants": [ + { + "id": "Participant_00", + "name": "Customer Onboarding" + } + ], + "nodes": [ + { + "id": "StartEvent_ApplicationReceived", + "name": "Application received", + "type": "startEvent", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "ServiceTask_GetCreditScore", + "name": "Get credit score", + "type": "serviceTask", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "BusinessRuleTask_CheckApplicationAutomatically", + "name": "Check application automatically", + "type": "businessRuleTask", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "ExclusiveGateway_Risk", + "name": "Risk?", + "type": "exclusiveGateway", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "ServiceTask_DeliverPolicy", + "name": "Deliver confirmation", + "type": "serviceTask", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "SendTask_SendPolicy", + "name": "Send confirmation", + "type": "serviceTask", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "EndEvent_ApplicationIssued", + "name": "Application issued", + "type": "endEvent", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "ServiceTask_RejectPolicy", + "name": "Reject application", + "type": "serviceTask", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "SendTask_SendRejection", + "name": "Send rejection", + "type": "serviceTask", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "EndEvent_ApplicationRejected", + "name": "Application rejected", + "type": "endEvent", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "Activity_ManualCheck", + "name": "Manual Check", + "type": "callActivity", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "ExclusiveGateway_Decision", + "name": "Decision?", + "type": "exclusiveGateway", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "ErrorBoundaryEvent_FraudDetected", + "name": "Fraud detected", + "type": "boundaryEvent", + "lane": null, + "attached_to": "Activity_ManualCheck" + }, + { + "id": "SendTask_ReportFraud", + "name": "Report fraud", + "type": "sendTask", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "TerminateEvent_ApplicationCanceledFraud", + "name": "Application canceled due to fraud", + "type": "endEvent", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "Activity_1ke2ixr", + "name": "Timeout Handler", + "type": "subProcess", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "StartErrorEvent_Timeout", + "name": "Timeout", + "type": "startEvent", + "lane": null, + "attached_to": null, + "parent_subprocess": "Activity_1ke2ixr" + }, + { + "id": "UserTask_HandleTimeout", + "name": "Handle Timeout", + "type": "userTask", + "lane": null, + "attached_to": null, + "parent_subprocess": "Activity_1ke2ixr" + }, + { + "id": "EndMessageEvent_Timeout", + "name": "Timeout handled", + "type": "endEvent", + "lane": null, + "attached_to": null, + "parent_subprocess": "Activity_1ke2ixr" + }, + { + "id": "Activity_0vp33kx", + "name": "Cancellation Handler", + "type": "subProcess", + "lane": null, + "attached_to": null, + "pool": "Participant_00" + }, + { + "id": "StartMessageEvent_CancellationRequested", + "name": "Cancelation requested", + "type": "startEvent", + "lane": null, + "attached_to": null, + "parent_subprocess": "Activity_0vp33kx" + }, + { + "id": "ServiceTask_CancelApplication", + "name": "Cancel application", + "type": "serviceTask", + "lane": null, + "attached_to": null, + "parent_subprocess": "Activity_0vp33kx" + }, + { + "id": "ParallelGateway_CancelApplication", + "name": "", + "type": "parallelGateway", + "lane": null, + "attached_to": null, + "parent_subprocess": "Activity_0vp33kx" + }, + { + "id": "EndMessageEvent_InformCustomer", + "name": "Customer notified about successful cancelation", + "type": "endEvent", + "lane": null, + "attached_to": null, + "parent_subprocess": "Activity_0vp33kx" + }, + { + "id": "EndMessageEvent_InformOperations", + "name": "Operations Team notified about successful cancelation", + "type": "endEvent", + "lane": null, + "attached_to": null, + "parent_subprocess": "Activity_0vp33kx" + } + ], + "sequence_flows": [ + { + "id": "sf_1", + "name": "", + "source": "StartEvent_ApplicationReceived", + "target": "ServiceTask_GetCreditScore" + }, + { + "id": "sf_2", + "name": "", + "source": "ServiceTask_GetCreditScore", + "target": "BusinessRuleTask_CheckApplicationAutomatically" + }, + { + "id": "sf_3", + "name": "", + "source": "BusinessRuleTask_CheckApplicationAutomatically", + "target": "ExclusiveGateway_Risk" + }, + { + "id": "sf_4", + "name": "Green (no risk)", + "source": "ExclusiveGateway_Risk", + "target": "ServiceTask_DeliverPolicy" + }, + { + "id": "sf_5", + "name": "Red (severe risk)", + "source": "ExclusiveGateway_Risk", + "target": "ServiceTask_RejectPolicy" + }, + { + "id": "sf_6", + "name": "Yellow (moderate risk)", + "source": "ExclusiveGateway_Risk", + "target": "Activity_ManualCheck" + }, + { + "id": "sf_7", + "name": "", + "source": "ServiceTask_DeliverPolicy", + "target": "SendTask_SendPolicy" + }, + { + "id": "sf_8", + "name": "", + "source": "SendTask_SendPolicy", + "target": "EndEvent_ApplicationIssued" + }, + { + "id": "sf_9", + "name": "", + "source": "ServiceTask_RejectPolicy", + "target": "SendTask_SendRejection" + }, + { + "id": "sf_10", + "name": "", + "source": "SendTask_SendRejection", + "target": "EndEvent_ApplicationRejected" + }, + { + "id": "sf_11", + "name": "", + "source": "Activity_ManualCheck", + "target": "ExclusiveGateway_Decision" + }, + { + "id": "sf_12", + "name": "Application accepted", + "source": "ExclusiveGateway_Decision", + "target": "ServiceTask_DeliverPolicy" + }, + { + "id": "sf_13", + "name": "Application declined", + "source": "ExclusiveGateway_Decision", + "target": "ServiceTask_RejectPolicy" + }, + { + "id": "sf_14", + "name": "", + "source": "ErrorBoundaryEvent_FraudDetected", + "target": "SendTask_ReportFraud" + }, + { + "id": "sf_15", + "name": "", + "source": "SendTask_ReportFraud", + "target": "TerminateEvent_ApplicationCanceledFraud" + }, + { + "id": "sf_16", + "name": "", + "source": "StartErrorEvent_Timeout", + "target": "UserTask_HandleTimeout" + }, + { + "id": "sf_17", + "name": "", + "source": "UserTask_HandleTimeout", + "target": "EndMessageEvent_Timeout" + }, + { + "id": "sf_18", + "name": "", + "source": "StartMessageEvent_CancellationRequested", + "target": "ServiceTask_CancelApplication" + }, + { + "id": "sf_19", + "name": "", + "source": "ServiceTask_CancelApplication", + "target": "ParallelGateway_CancelApplication" + }, + { + "id": "sf_20", + "name": "", + "source": "ParallelGateway_CancelApplication", + "target": "EndMessageEvent_InformCustomer" + }, + { + "id": "sf_21", + "name": "", + "source": "ParallelGateway_CancelApplication", + "target": "EndMessageEvent_InformOperations" + } + ] +} \ No newline at end of file diff --git a/data/15_bpmn_2_ground_truth.MMD b/data/15_bpmn_2_ground_truth.MMD new file mode 100644 index 0000000..01f697f --- /dev/null +++ b/data/15_bpmn_2_ground_truth.MMD @@ -0,0 +1,58 @@ +--- +config: + theme: default +--- +flowchart LR + subgraph Participant_00["Customer Onboarding"] + direction LR + StartEvent_ApplicationReceived(["Application received"]) + ServiceTask_GetCreditScore["Get credit score"] + BusinessRuleTask_CheckApplicationAutomatically["Check application automatically"] + ExclusiveGateway_Risk{"Risk?"} + ServiceTask_DeliverPolicy["Deliver confirmation"] + SendTask_SendPolicy["Send confirmation"] + EndEvent_ApplicationIssued(["Application issued"]) + ServiceTask_RejectPolicy["Reject application"] + SendTask_SendRejection["Send rejection"] + EndEvent_ApplicationRejected(["Application rejected"]) + Activity_ManualCheck[["Manual Check"]] + ExclusiveGateway_Decision{"Decision?"} + SendTask_ReportFraud["Report fraud"] + TerminateEvent_ApplicationCanceledFraud(["Application canceled due to fraud"]) + subgraph Activity_1ke2ixr["Timeout Handler"] + direction LR + StartErrorEvent_Timeout(["Timeout"]) + UserTask_HandleTimeout["Handle Timeout"] + EndMessageEvent_Timeout(["Timeout handled"]) + end + subgraph Activity_0vp33kx["Cancellation Handler"] + direction LR + StartMessageEvent_CancellationRequested(["Cancelation requested"]) + ServiceTask_CancelApplication["Cancel application"] + ParallelGateway_CancelApplication{{"+"}} + EndMessageEvent_InformCustomer(["Customer notified about successful cancelation"]) + EndMessageEvent_InformOperations(["Operations Team notified about successful cancelation"]) + end + end + StartEvent_ApplicationReceived --> ServiceTask_GetCreditScore + ServiceTask_GetCreditScore --> BusinessRuleTask_CheckApplicationAutomatically + BusinessRuleTask_CheckApplicationAutomatically --> ExclusiveGateway_Risk + ExclusiveGateway_Risk -->|"Green (no risk)"| ServiceTask_DeliverPolicy + ExclusiveGateway_Risk -->|"Red (severe risk)"| ServiceTask_RejectPolicy + ExclusiveGateway_Risk -->|"Yellow (moderate risk)"| Activity_ManualCheck + ServiceTask_DeliverPolicy --> SendTask_SendPolicy + SendTask_SendPolicy --> EndEvent_ApplicationIssued + ServiceTask_RejectPolicy --> SendTask_SendRejection + SendTask_SendRejection --> EndEvent_ApplicationRejected + Activity_ManualCheck --> ExclusiveGateway_Decision + ExclusiveGateway_Decision -->|"Application accepted"| ServiceTask_DeliverPolicy + ExclusiveGateway_Decision -->|"Application declined"| ServiceTask_RejectPolicy + SendTask_ReportFraud --> TerminateEvent_ApplicationCanceledFraud + StartErrorEvent_Timeout --> UserTask_HandleTimeout + UserTask_HandleTimeout --> EndMessageEvent_Timeout + StartMessageEvent_CancellationRequested --> ServiceTask_CancelApplication + ServiceTask_CancelApplication --> ParallelGateway_CancelApplication + ParallelGateway_CancelApplication --> EndMessageEvent_InformCustomer + ParallelGateway_CancelApplication --> EndMessageEvent_InformOperations + Activity_ManualCheck o--o ErrorBoundaryEvent_FraudDetected(("Fraud detected")) + ErrorBoundaryEvent_FraudDetected --> SendTask_ReportFraud diff --git a/data/16_it_2.JSON b/data/16_it_2.JSON new file mode 100644 index 0000000..95c55a0 --- /dev/null +++ b/data/16_it_2.JSON @@ -0,0 +1,169 @@ +{ + "metadata": { + "id": "it_2_16", + "diagram_type": "c4_container", + "tier": 2, + "entity_count": 11, + "container_count": 1, + "attachment_count": 0, + "description": "C4 Container diagram of SomeApp: extends it_1_07 with a full CI/CD delivery pipeline using GitLab, a CI runner, Terraform for infrastructure provisioning, and automated test execution" + }, + "system_boundary": { + "id": "infomaniak", + "name": "Infomaniak Public Cloud", + "type": "deployment_environment" + }, + "elements": [ + { + "id": "user", + "name": "User", + "type": "person", + "description": "End user accessing the web application via browser" + }, + { + "id": "developer", + "name": "Developer", + "type": "person", + "description": "Developer pushing code and managing infrastructure" + }, + { + "id": "cloudflare", + "name": "Cloudflare", + "type": "external_system", + "description": "CDN and edge proxy. Terminates HTTPS at the edge (www.someapp.xx) and routes traffic to the origin" + }, + { + "id": "gitlab", + "name": "GitLab", + "type": "external_system", + "description": "Source code repository and CI/CD orchestrator. Hosts the codebase and triggers pipeline runs on push" + }, + { + "id": "infomaniak_api", + "name": "Infomaniak API", + "type": "external_system", + "description": "Infomaniak public cloud management API. Used by Terraform to provision and manage infrastructure resources" + }, + { + "id": "ci_runner", + "name": "GitLab CI Runner", + "type": "container", + "technology": "GitLab Runner", + "boundary": "infomaniak", + "description": "Self-hosted CI runner executing pipeline jobs: build, test, and deploy stages" + }, + { + "id": "terraform", + "name": "Terraform", + "type": "container", + "technology": "Infrastructure as Code", + "boundary": "infomaniak", + "description": "Terraform executed by the CI runner to provision and update Infomaniak cloud resources" + }, + { + "id": "test_suite", + "name": "Test Suite", + "type": "container", + "technology": "Automated Tests", + "boundary": "infomaniak", + "description": "Automated test suite (unit and integration) executed by the CI runner as part of the pipeline" + }, + { + "id": "web_app", + "name": "SomeApp", + "type": "container", + "technology": "Web Application", + "boundary": "infomaniak", + "description": "Public-facing web application. Deployed by the CI pipeline and receives proxied requests from Cloudflare" + }, + { + "id": "postgres", + "name": "PostgreSQL", + "type": "container", + "technology": "Relational Database", + "boundary": "infomaniak", + "description": "Primary relational database storing application data. Provisioned via Terraform" + }, + { + "id": "object_storage", + "name": "Object Storage", + "type": "container", + "technology": "S3-compatible Bucket", + "boundary": "infomaniak", + "description": "S3-compatible object storage bucket. Provisioned via Terraform" + } + ], + "relationships": [ + { + "id": "r1", + "source": "developer", + "target": "gitlab", + "label": "Pushes code", + "technology": "Git / HTTPS" + }, + { + "id": "r2", + "source": "gitlab", + "target": "ci_runner", + "label": "Triggers pipeline", + "technology": "GitLab CI" + }, + { + "id": "r3", + "source": "ci_runner", + "target": "test_suite", + "label": "Executes tests", + "technology": "CI pipeline" + }, + { + "id": "r4", + "source": "ci_runner", + "target": "terraform", + "label": "Runs plan / apply", + "technology": "CI pipeline" + }, + { + "id": "r5", + "source": "terraform", + "target": "infomaniak_api", + "label": "Provisions infrastructure", + "technology": "REST API" + }, + { + "id": "r6", + "source": "ci_runner", + "target": "web_app", + "label": "Deploys application", + "technology": "SSH / Docker" + }, + { + "id": "r7", + "source": "user", + "target": "cloudflare", + "label": "Accesses via HTTPS", + "technology": "HTTPS", + "note": "www.someapp.xx" + }, + { + "id": "r8", + "source": "cloudflare", + "target": "web_app", + "label": "Reverse proxy / origin request", + "technology": "HTTPS" + }, + { + "id": "r9", + "source": "web_app", + "target": "postgres", + "label": "Reads and writes application data", + "technology": "PostgreSQL" + }, + { + "id": "r10", + "source": "web_app", + "target": "object_storage", + "label": "Reads and writes files", + "technology": "S3 API" + } + ] +} diff --git a/data/16_it_2_ground_truth.MMD b/data/16_it_2_ground_truth.MMD new file mode 100644 index 0000000..218b89a --- /dev/null +++ b/data/16_it_2_ground_truth.MMD @@ -0,0 +1,31 @@ +--- +config: + theme: default +--- +flowchart LR + user["User\n[Person]"] + developer["Developer\n[Person]"] + cloudflare["Cloudflare\n[External System]\nCDN / Edge Proxy"] + gitlab["GitLab\n[External System]\nSource Control / CI Orchestrator"] + infomaniak_api["Infomaniak API\n[External System]\nCloud Management API"] + + subgraph infomaniak["Infomaniak Public Cloud"] + direction LR + ci_runner["GitLab CI Runner\n[Container]"] + terraform["Terraform\n[Container]\nInfrastructure as Code"] + test_suite["Test Suite\n[Container]\nAutomated Tests"] + web_app["SomeApp\n[Container]\nWeb Application"] + postgres[("PostgreSQL\n[Container]\nRelational Database")] + object_storage[("Object Storage\n[Container]\nS3-compatible Bucket")] + end + + developer -->|"Push code · Git"| gitlab + gitlab -->|"Trigger pipeline"| ci_runner + ci_runner -->|"Execute tests"| test_suite + ci_runner -->|"Run plan / apply"| terraform + terraform -->|"Provision infrastructure · REST API"| infomaniak_api + ci_runner -->|"Deploy application · SSH"| web_app + user -->|"HTTPS · www.someapp.xx"| cloudflare + cloudflare -->|"Reverse proxy"| web_app + web_app -->|"Read / write · PostgreSQL"| postgres + web_app -->|"Read / write · S3 API"| object_storage diff --git a/data/17_it_2.JSON b/data/17_it_2.JSON new file mode 100644 index 0000000..c22a38c --- /dev/null +++ b/data/17_it_2.JSON @@ -0,0 +1,200 @@ +{ + "metadata": { + "id": "it_2_17", + "diagram_type": "network_topology", + "tier": 2, + "entity_count": 13, + "container_count": 1, + "attachment_count": 0, + "description": "Expanded small office network topology. Extends it_1_10 with an explicit ISP/WAN edge, badge/access control system, IP security cameras with NVR, and a POS terminal" + }, + "system_boundary": { + "id": "office_lan", + "name": "Office LAN", + "type": "network_boundary" + }, + "elements": [ + { + "id": "isp", + "name": "ISP", + "type": "external_system", + "description": "Internet Service Provider. Provides WAN uplink to the office router" + }, + { + "id": "router", + "name": "Router", + "type": "device", + "technology": "Router", + "boundary": "office_lan", + "description": "Edge router. WAN uplink to ISP, LAN downlink to firewall" + }, + { + "id": "firewall", + "name": "Firewall", + "type": "device", + "technology": "Firewall", + "boundary": "office_lan", + "description": "Network firewall enforcing traffic policies between WAN and LAN" + }, + { + "id": "switch", + "name": "Switch", + "type": "device", + "technology": "Network Switch", + "boundary": "office_lan", + "description": "Core office switch. Distributes traffic to all wired devices and the access point" + }, + { + "id": "access_point", + "name": "Access Point", + "type": "device", + "technology": "WiFi Access Point", + "boundary": "office_lan", + "description": "Wireless access point providing WiFi connectivity to laptops" + }, + { + "id": "nas", + "name": "NAS", + "type": "server", + "technology": "Network Attached Storage", + "boundary": "office_lan", + "description": "Shared file storage for the office, wired to the switch" + }, + { + "id": "printer", + "name": "Printer", + "type": "peripheral", + "technology": "Network Printer", + "boundary": "office_lan", + "description": "Shared office printer, wired to the switch" + }, + { + "id": "voip_phones", + "name": "VoIP Phones", + "type": "device", + "technology": "VoIP", + "boundary": "office_lan", + "description": "Wired VoIP phones connected to the switch" + }, + { + "id": "user_clients", + "name": "User Clients", + "type": "device", + "technology": "Laptops (WiFi)", + "boundary": "office_lan", + "description": "Employee laptops connecting wirelessly via the access point" + }, + { + "id": "nvr", + "name": "NVR", + "type": "server", + "technology": "Network Video Recorder", + "boundary": "office_lan", + "description": "Network Video Recorder. Receives and stores video feeds from all IP cameras" + }, + { + "id": "security_cameras", + "name": "Security Cameras", + "type": "device", + "technology": "IP Cameras (PoE)", + "boundary": "office_lan", + "description": "IP security cameras powered via PoE from the switch; stream video to the NVR" + }, + { + "id": "access_control", + "name": "Access Control", + "type": "server", + "technology": "Badge / Entry System", + "boundary": "office_lan", + "description": "Server managing badge readers and door controllers for physical access control" + }, + { + "id": "pos_terminal", + "name": "POS Terminal", + "type": "device", + "technology": "Point of Sale", + "boundary": "office_lan", + "description": "Point-of-sale terminal for retail transactions, wired to the switch" + } + ], + "relationships": [ + { + "id": "r1", + "source": "isp", + "target": "router", + "label": "WAN link" + }, + { + "id": "r2", + "source": "router", + "target": "firewall", + "label": "WAN → LAN" + }, + { + "id": "r3", + "source": "firewall", + "target": "switch", + "label": "Filtered traffic" + }, + { + "id": "r4", + "source": "switch", + "target": "nas", + "label": "Wired" + }, + { + "id": "r5", + "source": "switch", + "target": "printer", + "label": "Wired" + }, + { + "id": "r6", + "source": "switch", + "target": "voip_phones", + "label": "Wired" + }, + { + "id": "r7", + "source": "switch", + "target": "access_point", + "label": "Wired uplink" + }, + { + "id": "r8", + "source": "access_point", + "target": "user_clients", + "label": "WiFi" + }, + { + "id": "r9", + "source": "switch", + "target": "security_cameras", + "label": "PoE" + }, + { + "id": "r10", + "source": "security_cameras", + "target": "nvr", + "label": "Video stream" + }, + { + "id": "r11", + "source": "switch", + "target": "nvr", + "label": "Wired" + }, + { + "id": "r12", + "source": "switch", + "target": "access_control", + "label": "Wired" + }, + { + "id": "r13", + "source": "switch", + "target": "pos_terminal", + "label": "Wired" + } + ] +} diff --git a/data/17_it_2_ground_truth.MMD b/data/17_it_2_ground_truth.MMD new file mode 100644 index 0000000..7b70009 --- /dev/null +++ b/data/17_it_2_ground_truth.MMD @@ -0,0 +1,36 @@ +--- +config: + theme: default +--- +flowchart LR + isp["ISP\n[External System]\nWAN Provider"] + + subgraph office_lan["Office LAN"] + direction LR + router["Router\n[Device]\nEdge Router"] + firewall["Firewall\n[Device]"] + switch["Switch\n[Device]\nCore Switch"] + access_point["Access Point\n[Device]\nWiFi AP"] + nas[("NAS\n[Server]\nFile Storage")] + printer["Printer\n[Peripheral]"] + voip_phones["VoIP Phones\n[Device]"] + user_clients["User Clients\n[Device]\nLaptops (WiFi)"] + nvr[("NVR\n[Server]\nNetwork Video Recorder")] + security_cameras["Security Cameras\n[Device]\nIP Cameras (PoE)"] + access_control[("Access Control\n[Server]\nBadge / Entry System")] + pos_terminal["POS Terminal\n[Device]"] + end + + isp -->|"WAN link"| router + router -->|"WAN → LAN"| firewall + firewall -->|"Filtered traffic"| switch + switch -->|"Wired"| nas + switch -->|"Wired"| printer + switch -->|"Wired"| voip_phones + switch -->|"Wired uplink"| access_point + access_point -->|"WiFi"| user_clients + switch -->|"PoE"| security_cameras + security_cameras -->|"Video stream"| nvr + switch -->|"Wired"| nvr + switch -->|"Wired"| access_control + switch -->|"Wired"| pos_terminal diff --git a/data/18_it_2.JSON b/data/18_it_2.JSON new file mode 100644 index 0000000..83cc80d --- /dev/null +++ b/data/18_it_2.JSON @@ -0,0 +1,212 @@ +{ + "metadata": { + "id": "it_2_18", + "diagram_type": "c4_container", + "tier": 2, + "entity_count": 11, + "container_count": 2, + "attachment_count": 0, + "description": "C4 Container diagram of an extended GCP data analysis stack. Extends it_1_09 with async job processing via Cloud Tasks and a Background Worker, Cloud Storage for file handling, Secret Manager for credential management, and Cloud Monitoring for observability" + }, + "system_boundary": { + "id": "google_cloud", + "name": "Google Cloud", + "type": "cloud_environment" + }, + "elements": [ + { + "id": "user", + "name": "User", + "type": "person", + "description": "Analyst accessing the web app via browser" + }, + { + "id": "iap", + "name": "Google IAP", + "type": "container", + "technology": "Identity-Aware Proxy", + "boundary": "google_cloud", + "description": "Google Identity-Aware Proxy. Enforces authentication before forwarding requests to the web app" + }, + { + "id": "web_app", + "name": "Web App", + "type": "container", + "technology": "Web Application / MCP Host", + "boundary": "google_cloud", + "description": "GCP-hosted web application acting as the MCP host. Sends prompts to Gemini, calls MCP tools, enqueues async jobs, and reads/writes Cloud Storage" + }, + { + "id": "gemini", + "name": "Gemini", + "type": "container", + "technology": "Vertex AI (Google-managed)", + "boundary": "google_cloud", + "description": "Gemini model served via Vertex AI. Processes user prompts and drives MCP tool selection" + }, + { + "id": "mcp_server", + "name": "MCP Server", + "type": "container", + "technology": "MCP Server", + "boundary": "google_cloud", + "description": "MCP server exposing the orders view as a read-only query tool" + }, + { + "id": "cloud_tasks", + "name": "Cloud Tasks", + "type": "container", + "technology": "GCP Cloud Tasks", + "boundary": "google_cloud", + "description": "Managed async task queue. Receives jobs enqueued by the web app and dispatches them to the background worker" + }, + { + "id": "background_worker", + "name": "Background Worker", + "type": "container", + "technology": "Cloud Run / Worker Service", + "boundary": "google_cloud", + "description": "Long-running worker service triggered by Cloud Tasks. Reads and writes data to PostgreSQL and stores results in Cloud Storage" + }, + { + "id": "cloud_storage", + "name": "Cloud Storage", + "type": "container", + "technology": "GCS Bucket", + "boundary": "google_cloud", + "description": "Google Cloud Storage bucket. Used by the web app for file uploads/downloads and by the background worker to store job results" + }, + { + "id": "cloud_monitoring", + "name": "Cloud Monitoring", + "type": "container", + "technology": "GCP Cloud Monitoring", + "boundary": "google_cloud", + "description": "Centralized monitoring and logging service. Collects metrics and logs from the web app and background worker" + }, + { + "id": "secret_manager", + "name": "Secret Manager", + "type": "container", + "technology": "GCP Secret Manager", + "boundary": "google_cloud", + "description": "Manages and vends API keys, database credentials, and other secrets to authorised services" + }, + { + "id": "postgres", + "name": "PostgreSQL", + "type": "container", + "technology": "Relational Database", + "boundary": "google_cloud", + "description": "PostgreSQL database on GCP hosting the orders view and application data" + }, + { + "id": "orders_view", + "name": "orders", + "type": "container", + "technology": "Data Store (DB View)", + "boundary": "postgres", + "description": "Read-only PostgreSQL view exposing order data. Accessed by the MCP server" + } + ], + "relationships": [ + { + "id": "r1", + "source": "user", + "target": "iap", + "label": "Authenticates", + "technology": "HTTPS" + }, + { + "id": "r2", + "source": "iap", + "target": "web_app", + "label": "Forwards authenticated request", + "technology": "HTTPS" + }, + { + "id": "r3", + "source": "web_app", + "target": "gemini", + "label": "Prompts / responses", + "technology": "Gemini API" + }, + { + "id": "r4", + "source": "web_app", + "target": "mcp_server", + "label": "Calls MCP tools", + "technology": "MCP" + }, + { + "id": "r5", + "source": "mcp_server", + "target": "orders_view", + "label": "Read-only query", + "technology": "SQL" + }, + { + "id": "r6", + "source": "web_app", + "target": "cloud_tasks", + "label": "Enqueue async job", + "technology": "Cloud Tasks API" + }, + { + "id": "r7", + "source": "cloud_tasks", + "target": "background_worker", + "label": "Trigger job", + "technology": "HTTP callback" + }, + { + "id": "r8", + "source": "background_worker", + "target": "postgres", + "label": "Read / write data", + "technology": "SQL" + }, + { + "id": "r9", + "source": "background_worker", + "target": "cloud_storage", + "label": "Store results", + "technology": "GCS API" + }, + { + "id": "r10", + "source": "web_app", + "target": "cloud_storage", + "label": "Upload / download files", + "technology": "GCS API" + }, + { + "id": "r11", + "source": "web_app", + "target": "secret_manager", + "label": "Fetch API keys", + "technology": "Secret Manager API" + }, + { + "id": "r12", + "source": "background_worker", + "target": "secret_manager", + "label": "Fetch credentials", + "technology": "Secret Manager API" + }, + { + "id": "r13", + "source": "web_app", + "target": "cloud_monitoring", + "label": "Emit metrics / logs", + "technology": "Cloud Monitoring API" + }, + { + "id": "r14", + "source": "background_worker", + "target": "cloud_monitoring", + "label": "Emit metrics / logs", + "technology": "Cloud Monitoring API" + } + ] +} diff --git a/data/18_it_2_ground_truth.MMD b/data/18_it_2_ground_truth.MMD new file mode 100644 index 0000000..f02396d --- /dev/null +++ b/data/18_it_2_ground_truth.MMD @@ -0,0 +1,38 @@ +--- +config: + theme: default +--- +flowchart LR + user["User\n[Person]\nAnalyst"] + + subgraph google_cloud["Google Cloud"] + direction LR + iap["Google IAP\n[Container]\nIdentity-Aware Proxy"] + web_app["Web App\n[Container]\nMCP Host"] + gemini["Gemini\n[Container]\nVertex AI"] + mcp_server["MCP Server\n[Container]"] + cloud_tasks["Cloud Tasks\n[Container]\nAsync Queue"] + background_worker["Background Worker\n[Container]"] + cloud_storage[("Cloud Storage\n[Container]\nGCS Bucket")] + cloud_monitoring["Cloud Monitoring\n[Container]"] + secret_manager["Secret Manager\n[Container]"] + + subgraph postgres["PostgreSQL"] + orders_view[("orders\n[Data Store]\nDB View")] + end + end + + user -->|"Authenticates · HTTPS"| iap + iap -->|"Forwards request"| web_app + web_app -->|"Prompts / responses · Gemini API"| gemini + web_app -->|"Calls MCP tools · MCP"| mcp_server + mcp_server -->|"Read-only query · SQL"| orders_view + web_app -->|"Enqueue async job"| cloud_tasks + cloud_tasks -->|"Trigger job"| background_worker + background_worker -->|"Read / write · SQL"| postgres + background_worker -->|"Store results · GCS"| cloud_storage + web_app -->|"Upload / download · GCS"| cloud_storage + web_app -->|"Fetch secrets"| secret_manager + background_worker -->|"Fetch credentials"| secret_manager + web_app -->|"Metrics / logs"| cloud_monitoring + background_worker -->|"Metrics / logs"| cloud_monitoring diff --git a/data/19_it_2.JSON b/data/19_it_2.JSON new file mode 100644 index 0000000..b395b6f --- /dev/null +++ b/data/19_it_2.JSON @@ -0,0 +1,254 @@ +{ + "metadata": { + "id": "it_2_19", + "diagram_type": "network_topology", + "tier": 2, + "entity_count": 12, + "container_count": 7, + "attachment_count": 0, + "description": "Dual data center network topology with active/standby load balancing and failover. Each data center has a DMZ (firewall + load balancer) and an internal LAN (web app, auth/IAM, database). The global load balancer routes live traffic to DC1 and fails over to DC2. Databases and IAM replicate continuously between DCs" + }, + "system_boundary": { + "id": "enterprise_net", + "name": "Enterprise Network", + "type": "deployment_environment" + }, + "elements": [ + { + "id": "user_clients", + "name": "User Clients", + "type": "person", + "description": "End users accessing the web application via browser over HTTPS" + }, + { + "id": "global_lb", + "name": "Global Load Balancer", + "type": "external_system", + "description": "DNS-based / anycast global traffic manager. Routes requests to DC1 (active) and DC2 (failover)" + }, + { + "id": "dc1", + "name": "Data Center 1 — Primary", + "type": "deployment_environment", + "boundary": "enterprise_net", + "description": "Primary data center. Serves live traffic in normal operation" + }, + { + "id": "dmz1", + "name": "DMZ", + "type": "network_boundary", + "boundary": "dc1", + "description": "Demilitarised zone in DC1. Exposes only the firewall and load balancer to external traffic" + }, + { + "id": "fw1", + "name": "Firewall DC1", + "type": "device", + "technology": "Firewall", + "boundary": "dmz1", + "description": "Stateful firewall in DC1 DMZ. Filters inbound HTTPS traffic from the global load balancer" + }, + { + "id": "lb1", + "name": "Load Balancer DC1", + "type": "device", + "technology": "Application Load Balancer", + "boundary": "dmz1", + "description": "Application load balancer in DC1 DMZ. Distributes HTTPS traffic to DC1 web application servers" + }, + { + "id": "lan1", + "name": "Internal LAN", + "type": "network_boundary", + "boundary": "dc1", + "description": "Internal LAN in DC1. Hosts application servers and databases, isolated from the DMZ" + }, + { + "id": "web_app_1", + "name": "Web App 1", + "type": "server", + "technology": "Web Application Server", + "boundary": "lan1", + "description": "Primary web application server. Handles authenticated requests forwarded by the DC1 load balancer" + }, + { + "id": "auth_iam", + "name": "Auth / IAM", + "type": "server", + "technology": "Identity and Access Management", + "boundary": "lan1", + "description": "Primary auth and IAM service. Issues and validates session tokens for the web application" + }, + { + "id": "db_primary", + "name": "PostgreSQL Primary", + "type": "server", + "technology": "PostgreSQL (Read / Write)", + "boundary": "lan1", + "description": "Primary PostgreSQL database. Handles all read and write operations; streams replication to DC2" + }, + { + "id": "dc2", + "name": "Data Center 2 — Standby", + "type": "deployment_environment", + "boundary": "enterprise_net", + "description": "Standby data center. Receives replicated data continuously; promoted to active on DC1 failure" + }, + { + "id": "dmz2", + "name": "DMZ", + "type": "network_boundary", + "boundary": "dc2", + "description": "Demilitarised zone in DC2. Ready to accept live traffic on failover activation" + }, + { + "id": "fw2", + "name": "Firewall DC2", + "type": "device", + "technology": "Firewall", + "boundary": "dmz2", + "description": "Stateful firewall in DC2 DMZ. Filters failover HTTPS traffic from the global load balancer" + }, + { + "id": "lb2", + "name": "Load Balancer DC2", + "type": "device", + "technology": "Application Load Balancer", + "boundary": "dmz2", + "description": "Application load balancer in DC2 DMZ. Distributes traffic to DC2 web application servers on failover" + }, + { + "id": "lan2", + "name": "Internal LAN", + "type": "network_boundary", + "boundary": "dc2", + "description": "Internal LAN in DC2. Hosts standby application servers and replica databases" + }, + { + "id": "web_app_2", + "name": "Web App 2", + "type": "server", + "technology": "Web Application Server", + "boundary": "lan2", + "description": "Standby web application server. Serves live traffic when DC2 is promoted to active" + }, + { + "id": "auth_iam_replica", + "name": "Auth / IAM Replica", + "type": "server", + "technology": "Identity and Access Management", + "boundary": "lan2", + "description": "Replica auth and IAM service. Stays in sync with DC1 primary; provides auth locally on failover" + }, + { + "id": "db_replica", + "name": "PostgreSQL Replica", + "type": "server", + "technology": "PostgreSQL (Standby / Read)", + "boundary": "lan2", + "description": "Standby PostgreSQL replica. Receives streaming replication from DC1 primary; promoted to read-write on failover" + } + ], + "relationships": [ + { + "id": "r1", + "source": "user_clients", + "target": "global_lb", + "label": "HTTPS", + "technology": "HTTPS" + }, + { + "id": "r2", + "source": "global_lb", + "target": "fw1", + "label": "Active route", + "technology": "HTTPS" + }, + { + "id": "r3", + "source": "global_lb", + "target": "fw2", + "label": "Failover route", + "technology": "HTTPS", + "note": "Dashed — standby path; activated on DC1 failure" + }, + { + "id": "r4", + "source": "fw1", + "target": "lb1", + "label": "Filtered HTTPS", + "technology": "HTTPS" + }, + { + "id": "r5", + "source": "lb1", + "target": "web_app_1", + "label": "HTTP", + "technology": "HTTP" + }, + { + "id": "r6", + "source": "web_app_1", + "target": "auth_iam", + "label": "Auth request", + "technology": "Internal API" + }, + { + "id": "r7", + "source": "web_app_1", + "target": "db_primary", + "label": "Read / Write", + "technology": "SQL" + }, + { + "id": "r8", + "source": "fw2", + "target": "lb2", + "label": "Filtered HTTPS", + "technology": "HTTPS" + }, + { + "id": "r9", + "source": "lb2", + "target": "web_app_2", + "label": "HTTP", + "technology": "HTTP" + }, + { + "id": "r10", + "source": "web_app_2", + "target": "auth_iam_replica", + "label": "Auth request", + "technology": "Internal API" + }, + { + "id": "r11", + "source": "web_app_2", + "target": "db_replica", + "label": "Read / Standby", + "technology": "SQL" + }, + { + "id": "r12", + "source": "db_primary", + "target": "db_replica", + "label": "Streaming replication", + "technology": "PostgreSQL WAL" + }, + { + "id": "r13", + "source": "auth_iam", + "target": "auth_iam_replica", + "label": "Sync", + "technology": "Internal replication" + }, + { + "id": "r14", + "source": "fw1", + "target": "fw2", + "label": "Encrypted WAN", + "technology": "IPsec", + "note": "Bidirectional inter-DC link for replication traffic" + } + ] +} diff --git a/data/19_it_2_ground_truth.MMD b/data/19_it_2_ground_truth.MMD new file mode 100644 index 0000000..4560aae --- /dev/null +++ b/data/19_it_2_ground_truth.MMD @@ -0,0 +1,48 @@ +--- +config: + theme: default +--- +flowchart LR + user_clients["User Clients\n[Person]"] + global_lb["Global Load Balancer\n[External System]\nDNS / Anycast"] + + subgraph enterprise_net["Enterprise Network"] + subgraph dc1["Data Center 1 — Primary"] + subgraph dmz1["DMZ"] + fw1["Firewall\n[Device]"] + lb1["Load Balancer\n[Device]"] + end + subgraph lan1["Internal LAN"] + web_app_1["Web App 1\n[Server]"] + auth_iam["Auth / IAM\n[Server]\nPrimary"] + db_primary[("PostgreSQL Primary\n[Server]")] + end + end + + subgraph dc2["Data Center 2 — Standby"] + subgraph dmz2["DMZ"] + fw2["Firewall\n[Device]"] + lb2["Load Balancer\n[Device]"] + end + subgraph lan2["Internal LAN"] + web_app_2["Web App 2\n[Server]"] + auth_iam_replica["Auth / IAM\n[Server]\nReplica"] + db_replica[("PostgreSQL Replica\n[Server]")] + end + end + end + + user_clients -->|"HTTPS"| global_lb + global_lb -->|"Active"| fw1 + global_lb -.->|"Failover"| fw2 + fw1 -->|"Filtered HTTPS"| lb1 + lb1 -->|"HTTP"| web_app_1 + web_app_1 -->|"Auth request"| auth_iam + web_app_1 -->|"Read / Write · SQL"| db_primary + fw2 -->|"Filtered HTTPS"| lb2 + lb2 -->|"HTTP"| web_app_2 + web_app_2 -->|"Auth request"| auth_iam_replica + web_app_2 -->|"Read / Standby · SQL"| db_replica + db_primary -->|"Streaming replication · WAL"| db_replica + auth_iam -->|"Sync"| auth_iam_replica + fw1 <-->|"Encrypted WAN · IPsec"| fw2 diff --git a/data/20_it_2.JSON b/data/20_it_2.JSON new file mode 100644 index 0000000..d99db90 --- /dev/null +++ b/data/20_it_2.JSON @@ -0,0 +1,210 @@ +{ + "metadata": { + "id": "it_2_20", + "diagram_type": "c4_container", + "tier": 2, + "entity_count": 12, + "container_count": 2, + "attachment_count": 0, + "description": "C4 Container diagram of a hybrid cloud / on-premises architecture. External users reach the application via a cloud CDN and load balancer, which routes requests through an IPsec VPN tunnel to on-premises application servers. Internal users access on-premises resources directly. The cloud layer provides VPN termination, object storage for backups, and centralised monitoring" + }, + "system_boundary": { + "id": "on_prem_dc", + "name": "On-Premises Data Center", + "type": "deployment_environment" + }, + "elements": [ + { + "id": "external_users", + "name": "External Users", + "type": "person", + "description": "Remote users accessing the application from the public internet via browser" + }, + { + "id": "internal_users", + "name": "Internal Users", + "type": "person", + "description": "Office employees accessing on-premises resources directly or via VPN" + }, + { + "id": "cloud_env", + "name": "Cloud Environment", + "type": "deployment_environment", + "description": "Public cloud layer providing CDN/LB, VPN termination, object storage, and monitoring" + }, + { + "id": "cdn_lb", + "name": "CDN / Load Balancer", + "type": "container", + "technology": "Cloud CDN + Load Balancer", + "boundary": "cloud_env", + "description": "Cloud-managed CDN and application load balancer. Terminates public HTTPS traffic and routes it to the cloud VPN gateway" + }, + { + "id": "vpn_gw_cloud", + "name": "VPN Gateway (Cloud)", + "type": "container", + "technology": "Cloud VPN Gateway", + "boundary": "cloud_env", + "description": "Cloud-side IPsec VPN gateway. Establishes an encrypted tunnel to the on-premises VPN gateway" + }, + { + "id": "cloud_storage", + "name": "Cloud Storage", + "type": "container", + "technology": "Object Storage (S3-compatible)", + "boundary": "cloud_env", + "description": "Object storage bucket. Receives backup syncs from the on-premises file server and application server" + }, + { + "id": "cloud_monitoring", + "name": "Cloud Monitoring", + "type": "container", + "technology": "Managed Monitoring Service", + "boundary": "cloud_env", + "description": "Centralised monitoring and alerting. Collects metrics and logs pushed by the on-premises application server" + }, + { + "id": "vpn_gw_onprem", + "name": "VPN Gateway (On-Prem)", + "type": "container", + "technology": "On-Premises VPN Appliance", + "boundary": "on_prem_dc", + "description": "On-premises IPsec VPN gateway. Terminates the cloud-side tunnel and forwards traffic to the internal firewall" + }, + { + "id": "firewall", + "name": "Firewall", + "type": "container", + "technology": "Network Firewall", + "boundary": "on_prem_dc", + "description": "On-premises firewall. Filters traffic between the VPN gateway and internal servers" + }, + { + "id": "app_server", + "name": "App Server", + "type": "container", + "technology": "Web Application Server", + "boundary": "on_prem_dc", + "description": "Core application server. Handles business logic, authenticates against Active Directory, and reads/writes the database and file server" + }, + { + "id": "db_server", + "name": "Database", + "type": "container", + "technology": "PostgreSQL", + "boundary": "on_prem_dc", + "description": "On-premises PostgreSQL database. Stores all application data" + }, + { + "id": "file_server", + "name": "File Server", + "type": "container", + "technology": "NFS File Server", + "boundary": "on_prem_dc", + "description": "Shared NFS file server. Stores documents and media; syncs to cloud object storage for backup" + }, + { + "id": "ad_server", + "name": "Active Directory", + "type": "container", + "technology": "LDAP / Active Directory", + "boundary": "on_prem_dc", + "description": "On-premises Active Directory. Provides authentication and IAM for internal users and the application server" + } + ], + "relationships": [ + { + "id": "r1", + "source": "external_users", + "target": "cdn_lb", + "label": "HTTPS", + "technology": "HTTPS" + }, + { + "id": "r2", + "source": "internal_users", + "target": "vpn_gw_onprem", + "label": "VPN / direct", + "technology": "VPN / LAN" + }, + { + "id": "r3", + "source": "internal_users", + "target": "ad_server", + "label": "Login · LDAP", + "technology": "LDAP" + }, + { + "id": "r4", + "source": "cdn_lb", + "target": "vpn_gw_cloud", + "label": "Route to on-prem", + "technology": "HTTPS" + }, + { + "id": "r5", + "source": "vpn_gw_cloud", + "target": "vpn_gw_onprem", + "label": "IPsec VPN tunnel", + "technology": "IPsec", + "note": "Bidirectional encrypted tunnel" + }, + { + "id": "r6", + "source": "vpn_gw_onprem", + "target": "firewall", + "label": "Internal traffic", + "technology": "LAN" + }, + { + "id": "r7", + "source": "firewall", + "target": "app_server", + "label": "Filtered HTTPS", + "technology": "HTTPS" + }, + { + "id": "r8", + "source": "app_server", + "target": "ad_server", + "label": "Auth · LDAP", + "technology": "LDAP" + }, + { + "id": "r9", + "source": "app_server", + "target": "db_server", + "label": "Read / write", + "technology": "SQL" + }, + { + "id": "r10", + "source": "app_server", + "target": "file_server", + "label": "File access", + "technology": "NFS" + }, + { + "id": "r11", + "source": "app_server", + "target": "cloud_storage", + "label": "Sync files", + "technology": "S3 API" + }, + { + "id": "r12", + "source": "file_server", + "target": "cloud_storage", + "label": "Backup", + "technology": "S3 API" + }, + { + "id": "r13", + "source": "app_server", + "target": "cloud_monitoring", + "label": "Metrics / logs", + "technology": "HTTPS" + } + ] +} diff --git a/data/20_it_2_ground_truth.MMD b/data/20_it_2_ground_truth.MMD new file mode 100644 index 0000000..0cbcb5e --- /dev/null +++ b/data/20_it_2_ground_truth.MMD @@ -0,0 +1,39 @@ +--- +config: + theme: default +--- +flowchart LR + external_users["External Users\n[Person]\nRemote / Browser"] + internal_users["Internal Users\n[Person]\nOffice / VPN"] + + subgraph cloud_env["Cloud Environment"] + direction LR + cdn_lb["CDN / Load Balancer\n[Container]\nEdge LB"] + vpn_gw_cloud["VPN Gateway\n[Container]\nCloud Side"] + cloud_storage[("Cloud Storage\n[Container]\nObject Storage")] + cloud_monitoring["Cloud Monitoring\n[Container]"] + end + + subgraph on_prem_dc["On-Premises Data Center"] + direction LR + vpn_gw_onprem["VPN Gateway\n[Container]\nOn-Prem Side"] + firewall["Firewall\n[Container]"] + app_server["App Server\n[Container]\nCore Application"] + db_server[("Database\n[Container]\nPostgreSQL")] + file_server[("File Server\n[Container]\nNFS")] + ad_server["Active Directory\n[Container]\nAuth / IAM"] + end + + external_users -->|"HTTPS"| cdn_lb + internal_users -->|"VPN / direct"| vpn_gw_onprem + internal_users -->|"Login · LDAP"| ad_server + cdn_lb -->|"Route to on-prem"| vpn_gw_cloud + vpn_gw_cloud <-->|"IPsec VPN tunnel"| vpn_gw_onprem + vpn_gw_onprem -->|"Internal traffic"| firewall + firewall -->|"Filtered HTTPS"| app_server + app_server -->|"Auth · LDAP"| ad_server + app_server -->|"Read / write · SQL"| db_server + app_server -->|"File access · NFS"| file_server + app_server -->|"Sync files · S3"| cloud_storage + file_server -->|"Backup · S3"| cloud_storage + app_server -->|"Metrics / logs"| cloud_monitoring diff --git a/data/21_bpmn_3.JSON b/data/21_bpmn_3.JSON new file mode 100644 index 0000000..7bc3fdb --- /dev/null +++ b/data/21_bpmn_3.JSON @@ -0,0 +1,450 @@ +{ + "metadata": { + "id": "bpmn_3_21", + "source": "B.1.0.bpmn", + "diagram_type": "bpmn_collaboration", + "tier": 3, + "entity_count": 29, + "container_count": 5, + "attachment_count": 0, + "description": "Two-pool collaboration with lanes, mixed task types (abstract/user/service), collapsed and expanded sub-processes, three call activities (global task, expanded, collapsed), message start/end events, timer start, and terminate end event" + }, + "participants": [ + { + "id": "pool_participant", + "name": "Participant", + "lanes": [] + }, + { + "id": "pool_pool", + "name": "Pool", + "lanes": [ + "lane_1", + "lane_2" + ] + }, + { + "id": "pool_standalone", + "name": "Standalone Processes", + "lanes": [] + } + ], + "lanes": [ + { + "id": "lane_1", + "name": "Lane 1", + "pool": "pool_pool" + }, + { + "id": "lane_2", + "name": "Lane 2", + "pool": "pool_pool" + } + ], + "nodes": [ + { + "id": "start_none_1", + "name": "Start Event None 1", + "type": "startEvent", + "pool": "pool_standalone", + "lane": null, + "attached_to": null + }, + { + "id": "task_4", + "name": "Abstract Task 4", + "type": "task", + "pool": "pool_standalone", + "lane": null, + "attached_to": null + }, + { + "id": "end_none_2", + "name": "End Event None 2", + "type": "endEvent", + "pool": "pool_standalone", + "lane": null, + "attached_to": null + }, + { + "id": "start_none_3", + "name": "Start Event None 3", + "type": "startEvent", + "pool": "pool_standalone", + "lane": null, + "attached_to": null + }, + { + "id": "task_8", + "name": "Abstract Task 8", + "type": "task", + "pool": "pool_standalone", + "lane": null, + "attached_to": null + }, + { + "id": "end_none_4", + "name": "End Event None 4", + "type": "endEvent", + "pool": "pool_standalone", + "lane": null, + "attached_to": null + }, + { + "id": "start_timer", + "name": "Start Event Timer", + "type": "startEvent", + "pool": "pool_participant", + "lane": null, + "attached_to": null + }, + { + "id": "task_1", + "name": "Abstract Task 1", + "type": "task", + "pool": "pool_participant", + "lane": null, + "attached_to": null + }, + { + "id": "user_2", + "name": "User Task 2", + "type": "userTask", + "pool": "pool_participant", + "lane": null, + "attached_to": null + }, + { + "id": "svc_3", + "name": "Service Task 3", + "type": "serviceTask", + "pool": "pool_participant", + "lane": null, + "attached_to": null + }, + { + "id": "end_none_1", + "name": "End Event None 1", + "type": "endEvent", + "pool": "pool_participant", + "lane": null, + "attached_to": null + }, + { + "id": "start_msg", + "name": "Start Event Message", + "type": "startEvent", + "pool": "pool_pool", + "lane": "lane_1", + "attached_to": null + }, + { + "id": "pgw_div", + "name": "Parallel Gateway Divergence", + "type": "parallelGateway", + "pool": "pool_pool", + "lane": "lane_1", + "attached_to": null + }, + { + "id": "gw_div_1", + "name": "Exclusive Gateway Divergence 1", + "type": "exclusiveGateway", + "pool": "pool_pool", + "lane": "lane_1", + "attached_to": null + }, + { + "id": "gw_conv_1", + "name": "Exclusive Gateway Convergence 1", + "type": "exclusiveGateway", + "pool": "pool_pool", + "lane": "lane_1", + "attached_to": null + }, + { + "id": "call_global", + "name": "Call Activity Calling a Global Task", + "type": "callActivity", + "pool": "pool_pool", + "lane": "lane_1", + "attached_to": null + }, + { + "id": "call_expanded", + "name": "Call Activity - Expanded", + "type": "callActivity", + "pool": "pool_pool", + "lane": "lane_1", + "attached_to": null + }, + { + "id": "call_collapsed", + "name": "Call Activity Collapsed", + "type": "callActivity", + "pool": "pool_pool", + "lane": "lane_1", + "attached_to": null + }, + { + "id": "end_msg", + "name": "End Event Message", + "type": "endEvent", + "pool": "pool_pool", + "lane": "lane_1", + "attached_to": null + }, + { + "id": "start_none_2", + "name": "Start Event None 2", + "type": "startEvent", + "pool": "pool_pool", + "lane": null, + "attached_to": null + }, + { + "id": "task_6", + "name": "Abstract Task 6", + "type": "task", + "pool": "pool_pool", + "lane": null, + "attached_to": null + }, + { + "id": "end_none_3", + "name": "End Event None 3", + "type": "endEvent", + "pool": "pool_pool", + "lane": null, + "attached_to": null + }, + { + "id": "user_5", + "name": "User Task 5", + "type": "userTask", + "pool": "pool_pool", + "lane": "lane_2", + "attached_to": null + }, + { + "id": "svc_7", + "name": "Service Task 7", + "type": "serviceTask", + "pool": "pool_pool", + "lane": "lane_2", + "attached_to": null + }, + { + "id": "sub_collapsed", + "name": "Collapsed Sub-Process", + "type": "subProcess", + "pool": "pool_pool", + "lane": "lane_2", + "attached_to": null + }, + { + "id": "sub_expanded", + "name": "Sub Process - Expanded", + "type": "subProcess", + "pool": "pool_pool", + "lane": "lane_2", + "attached_to": null + }, + { + "id": "gw_div_2", + "name": "Exclusive Gateway Divergence 2", + "type": "exclusiveGateway", + "pool": "pool_pool", + "lane": "lane_2", + "attached_to": null + }, + { + "id": "gw_conv_2", + "name": "Exclusive Gateway Convergence 2", + "type": "exclusiveGateway", + "pool": "pool_pool", + "lane": "lane_2", + "attached_to": null + }, + { + "id": "end_terminate", + "name": "End Event Terminate", + "type": "endEvent", + "pool": "pool_pool", + "lane": "lane_2", + "attached_to": null + } + ], + "sequence_flows": [ + { + "id": "sf1", + "source": "start_none_1", + "target": "task_4", + "name": "" + }, + { + "id": "sf2", + "source": "task_4", + "target": "end_none_2", + "name": "" + }, + { + "id": "sf3", + "source": "start_none_3", + "target": "task_8", + "name": "" + }, + { + "id": "sf4", + "source": "task_8", + "target": "end_none_4", + "name": "" + }, + { + "id": "sf5", + "source": "start_timer", + "target": "task_1", + "name": "" + }, + { + "id": "sf6", + "source": "task_1", + "target": "user_2", + "name": "" + }, + { + "id": "sf7", + "source": "user_2", + "target": "svc_3", + "name": "" + }, + { + "id": "sf8", + "source": "svc_3", + "target": "end_none_1", + "name": "" + }, + { + "id": "sf9", + "source": "start_none_2", + "target": "task_6", + "name": "" + }, + { + "id": "sf10", + "source": "task_6", + "target": "end_none_3", + "name": "" + }, + { + "id": "sf11", + "source": "start_msg", + "target": "pgw_div", + "name": "" + }, + { + "id": "sf12", + "source": "pgw_div", + "target": "user_5", + "name": "" + }, + { + "id": "sf13", + "source": "pgw_div", + "target": "gw_div_1", + "name": "" + }, + { + "id": "sf14", + "source": "gw_div_1", + "target": "call_collapsed", + "name": "" + }, + { + "id": "sf15", + "source": "gw_div_1", + "target": "call_global", + "name": "" + }, + { + "id": "sf16", + "source": "call_collapsed", + "target": "call_expanded", + "name": "" + }, + { + "id": "sf17", + "source": "call_global", + "target": "gw_conv_1", + "name": "" + }, + { + "id": "sf18", + "source": "call_expanded", + "target": "gw_conv_1", + "name": "" + }, + { + "id": "sf19", + "source": "gw_conv_1", + "target": "end_msg", + "name": "" + }, + { + "id": "sf20", + "source": "user_5", + "target": "gw_div_2", + "name": "" + }, + { + "id": "sf21", + "source": "gw_div_2", + "target": "sub_collapsed", + "name": "" + }, + { + "id": "sf22", + "source": "gw_div_2", + "target": "svc_7", + "name": "" + }, + { + "id": "sf23", + "source": "sub_collapsed", + "target": "sub_expanded", + "name": "" + }, + { + "id": "sf24", + "source": "sub_expanded", + "target": "gw_conv_2", + "name": "" + }, + { + "id": "sf25", + "source": "svc_7", + "target": "gw_conv_2", + "name": "" + }, + { + "id": "sf26", + "source": "gw_conv_2", + "target": "end_terminate", + "name": "" + } + ], + "message_flows": [ + { + "id": "mf1", + "source": "task_1", + "target": "start_msg", + "name": "Message Flow 1" + }, + { + "id": "mf2", + "source": "end_msg", + "target": "svc_3", + "name": "Message Flow 2" + } + ] +} \ No newline at end of file diff --git a/data/21_bpmn_3_ground_truth.MMD b/data/21_bpmn_3_ground_truth.MMD new file mode 100644 index 0000000..bdfb939 --- /dev/null +++ b/data/21_bpmn_3_ground_truth.MMD @@ -0,0 +1,93 @@ +--- +config: + theme: default +--- +flowchart LR + subgraph pool_standalone["Standalone Processes"] + direction LR + start_none_1(["Start Event None 1"]) + task_4["Abstract Task 4"] + end_none_2(["End Event None 2"]) + start_none_3(["Start Event None 3"]) + task_8["Abstract Task 8"] + end_none_4(["End Event None 4"]) + end + + subgraph pool_participant["Participant"] + direction LR + start_timer(["Start Event Timer"]) + task_1["Abstract Task 1"] + user_2["User Task 2"] + svc_3["Service Task 3"] + end_none_1(["End Event None 1"]) + end + + subgraph pool_pool["Pool"] + direction LR + start_none_2(["Start Event None 2"]) + task_6["Abstract Task 6"] + end_none_3(["End Event None 3"]) + + subgraph lane_1["Lane 1"] + direction LR + start_msg(["Start Event Message"]) + pgw_div{{"Parallel Gateway Divergence"}} + gw_div_1{"Exclusive Gateway Divergence 1"} + gw_conv_1{"Exclusive Gateway Convergence 1"} + call_global[["Call Activity Calling a Global Task"]] + call_expanded[["Call Activity - Expanded"]] + call_collapsed[["Call Activity Collapsed"]] + end_msg(["End Event Message"]) + end + + subgraph lane_2["Lane 2"] + direction LR + user_5["User Task 5"] + svc_7["Service Task 7"] + sub_collapsed[["Collapsed Sub-Process"]] + sub_expanded[["Sub Process - Expanded"]] + gw_div_2{"Exclusive Gateway Divergence 2"} + gw_conv_2{"Exclusive Gateway Convergence 2"} + end_terminate(["End Event Terminate"]) + end + end + + %% Standalone processes + start_none_1 --> task_4 + task_4 --> end_none_2 + start_none_3 --> task_8 + task_8 --> end_none_4 + + %% Participant pool + start_timer --> task_1 + task_1 --> user_2 + user_2 --> svc_3 + svc_3 --> end_none_1 + + %% Pool — no-lane path + start_none_2 --> task_6 + task_6 --> end_none_3 + + %% Pool — message path + start_msg --> pgw_div + pgw_div --> user_5 + pgw_div --> gw_div_1 + gw_div_1 --> call_collapsed + gw_div_1 --> call_global + call_collapsed --> call_expanded + call_global --> gw_conv_1 + call_expanded --> gw_conv_1 + gw_conv_1 --> end_msg + + %% Pool — Lane 2 path + user_5 --> gw_div_2 + gw_div_2 --> sub_collapsed + gw_div_2 --> svc_7 + sub_collapsed --> sub_expanded + sub_expanded --> gw_conv_2 + svc_7 --> gw_conv_2 + gw_conv_2 --> end_terminate + + %% Message flows + task_1 -.->|"Message Flow 1"| start_msg + end_msg -.->|"Message Flow 2"| svc_3 diff --git a/data/22_bpmn_3.JSON b/data/22_bpmn_3.JSON new file mode 100644 index 0000000..a80d2c9 --- /dev/null +++ b/data/22_bpmn_3.JSON @@ -0,0 +1,475 @@ +{ + "metadata": { + "id": "bpmn_3_22", + "source": "C.2.0.bpmn", + "diagram_type": "bpmn_collaboration", + "tier": 3, + "entity_count": 30, + "container_count": 6, + "attachment_count": 1, + "description": "Four-pool e-commerce collaboration: Customer browses and checks out (Checkout sub-process), sends payment to Credit Card Company, Amazon picks and packages in two lanes (Picker/Packager), Carrier delivers. Five cross-pool message flows, error boundary on Checkout." + }, + "participants": [ + { + "id": "pool_customer", + "name": "Customer", + "lanes": [] + }, + { + "id": "pool_amazon", + "name": "Amazon", + "lanes": [ + "lane_picker", + "lane_packager" + ] + }, + { + "id": "pool_carrier", + "name": "Carrier", + "lanes": [] + }, + { + "id": "pool_cc", + "name": "Credit Card Company", + "lanes": [] + } + ], + "lanes": [ + { + "id": "lane_picker", + "name": "Picker", + "pool": "pool_amazon" + }, + { + "id": "lane_packager", + "name": "Packager", + "pool": "pool_amazon" + } + ], + "nodes": [ + { + "id": "start_cust_browse", + "name": "", + "type": "startEvent", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "task_browse", + "name": "Browse Products on Amazon", + "type": "task", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "task_add_cart", + "name": "Add Item to Cart", + "type": "task", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "xgw_done_shopping", + "name": "Done Shopping?", + "type": "exclusiveGateway", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "sub_checkout", + "name": "Checkout", + "type": "subProcess", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "boundary_checkout_err", + "name": "", + "type": "boundaryEvent", + "pool": "pool_customer", + "lane": null, + "attached_to": "sub_checkout" + }, + { + "id": "start_cust_pay", + "name": "", + "type": "startEvent", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "task_pay_order", + "name": "Pay Order", + "type": "task", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "xgw_payment_accepted", + "name": "Payment accepted?", + "type": "exclusiveGateway", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "xgw_retry", + "name": "Retry?", + "type": "exclusiveGateway", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "ithrow_send_order", + "name": "Send Order", + "type": "intermediateThrowEvent", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "task_receive_items", + "name": "Receive items", + "type": "task", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "end_cust_checkout", + "name": "", + "type": "endEvent", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "end_cust_pay_fail", + "name": "", + "type": "endEvent", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "end_cust_send", + "name": "", + "type": "endEvent", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "end_cust_received", + "name": "", + "type": "endEvent", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "end_cust_err", + "name": "", + "type": "endEvent", + "pool": "pool_customer", + "lane": null, + "attached_to": null + }, + { + "id": "start_amz_order", + "name": "Receive Order", + "type": "startEvent", + "pool": "pool_amazon", + "lane": "lane_picker", + "attached_to": null + }, + { + "id": "task_pick", + "name": "Pick items", + "type": "task", + "pool": "pool_amazon", + "lane": "lane_picker", + "attached_to": null + }, + { + "id": "task_bin", + "name": "Place in bin", + "type": "task", + "pool": "pool_amazon", + "lane": "lane_picker", + "attached_to": null + }, + { + "id": "task_package", + "name": "Receive and Package items", + "type": "task", + "pool": "pool_amazon", + "lane": "lane_packager", + "attached_to": null + }, + { + "id": "task_carrier_dock", + "name": "Send to carrier dock", + "type": "task", + "pool": "pool_amazon", + "lane": "lane_packager", + "attached_to": null + }, + { + "id": "end_amz", + "name": "", + "type": "endEvent", + "pool": "pool_amazon", + "lane": "lane_packager", + "attached_to": null + }, + { + "id": "start_carrier", + "name": "Pick items", + "type": "startEvent", + "pool": "pool_carrier", + "lane": null, + "attached_to": null + }, + { + "id": "task_load_truck", + "name": "Load Truck", + "type": "task", + "pool": "pool_carrier", + "lane": null, + "attached_to": null + }, + { + "id": "task_deliver", + "name": "Deliver Items", + "type": "task", + "pool": "pool_carrier", + "lane": null, + "attached_to": null + }, + { + "id": "end_carrier", + "name": "", + "type": "endEvent", + "pool": "pool_carrier", + "lane": null, + "attached_to": null + }, + { + "id": "start_cc", + "name": "Receive Credit Card Information", + "type": "startEvent", + "pool": "pool_cc", + "lane": null, + "attached_to": null + }, + { + "id": "task_take_payment", + "name": "Take Payment", + "type": "task", + "pool": "pool_cc", + "lane": null, + "attached_to": null + }, + { + "id": "end_cc", + "name": "Send Result", + "type": "endEvent", + "pool": "pool_cc", + "lane": null, + "attached_to": null + } + ], + "sequence_flows": [ + { + "id": "sf1", + "source": "start_cust_browse", + "target": "task_browse", + "name": "" + }, + { + "id": "sf2", + "source": "task_browse", + "target": "task_add_cart", + "name": "" + }, + { + "id": "sf3", + "source": "task_add_cart", + "target": "xgw_done_shopping", + "name": "" + }, + { + "id": "sf4", + "source": "xgw_done_shopping", + "target": "task_browse", + "name": "No" + }, + { + "id": "sf5", + "source": "xgw_done_shopping", + "target": "sub_checkout", + "name": "Yes" + }, + { + "id": "sf6", + "source": "sub_checkout", + "target": "task_receive_items", + "name": "" + }, + { + "id": "sf7", + "source": "task_receive_items", + "target": "end_cust_received", + "name": "" + }, + { + "id": "sf8", + "source": "boundary_checkout_err", + "target": "end_cust_err", + "name": "" + }, + { + "id": "sf9", + "source": "start_cust_pay", + "target": "task_pay_order", + "name": "" + }, + { + "id": "sf10", + "source": "task_pay_order", + "target": "xgw_payment_accepted", + "name": "" + }, + { + "id": "sf11", + "source": "xgw_payment_accepted", + "target": "xgw_retry", + "name": "No" + }, + { + "id": "sf12", + "source": "xgw_payment_accepted", + "target": "ithrow_send_order", + "name": "Yes" + }, + { + "id": "sf13", + "source": "xgw_retry", + "target": "task_pay_order", + "name": "Yes" + }, + { + "id": "sf14", + "source": "xgw_retry", + "target": "end_cust_pay_fail", + "name": "No" + }, + { + "id": "sf15", + "source": "ithrow_send_order", + "target": "end_cust_send", + "name": "" + }, + { + "id": "sf16", + "source": "start_amz_order", + "target": "task_pick", + "name": "" + }, + { + "id": "sf17", + "source": "task_pick", + "target": "task_bin", + "name": "" + }, + { + "id": "sf18", + "source": "task_bin", + "target": "task_package", + "name": "" + }, + { + "id": "sf19", + "source": "task_package", + "target": "task_carrier_dock", + "name": "" + }, + { + "id": "sf20", + "source": "task_carrier_dock", + "target": "end_amz", + "name": "" + }, + { + "id": "sf21", + "source": "start_carrier", + "target": "task_load_truck", + "name": "" + }, + { + "id": "sf22", + "source": "task_load_truck", + "target": "task_deliver", + "name": "" + }, + { + "id": "sf23", + "source": "task_deliver", + "target": "end_carrier", + "name": "" + }, + { + "id": "sf24", + "source": "start_cc", + "target": "task_take_payment", + "name": "" + }, + { + "id": "sf25", + "source": "task_take_payment", + "target": "end_cc", + "name": "" + } + ], + "message_flows": [ + { + "id": "mf1", + "source": "ithrow_send_order", + "target": "start_amz_order", + "name": "" + }, + { + "id": "mf2", + "source": "task_pay_order", + "target": "start_cc", + "name": "Send Credit Card Information" + }, + { + "id": "mf3", + "source": "end_cc", + "target": "task_pay_order", + "name": "" + }, + { + "id": "mf4", + "source": "task_carrier_dock", + "target": "start_carrier", + "name": "" + }, + { + "id": "mf5", + "source": "task_deliver", + "target": "task_receive_items", + "name": "" + } + ] +} \ No newline at end of file diff --git a/data/22_bpmn_3_ground_truth.MMD b/data/22_bpmn_3_ground_truth.MMD new file mode 100644 index 0000000..b20b1b3 --- /dev/null +++ b/data/22_bpmn_3_ground_truth.MMD @@ -0,0 +1,99 @@ +--- +config: + theme: default +--- +flowchart LR + subgraph pool_customer["Customer"] + direction LR + start_cust_browse(["Start"]) + task_browse["Browse Products on Amazon"] + task_add_cart["Add Item to Cart"] + xgw_done_shopping{"Done Shopping?"} + sub_checkout[["Checkout"]] + boundary_checkout_err(("Error")) + task_receive_items["Receive items"] + end_cust_received(["End"]) + end_cust_err(["Error End"]) + start_cust_pay(["Start"]) + task_pay_order["Pay Order"] + xgw_payment_accepted{"Payment accepted?"} + xgw_retry{"Retry?"} + ithrow_send_order(("Send Order")) + end_cust_send(["End"]) + end_cust_pay_fail(["End"]) + end_cust_checkout(["End"]) + end + + subgraph pool_amazon["Amazon"] + direction LR + subgraph lane_picker["Picker"] + direction LR + start_amz_order(["Receive Order"]) + task_pick["Pick items"] + task_bin["Place in bin"] + end + subgraph lane_packager["Packager"] + direction LR + task_package["Receive and Package items"] + task_carrier_dock["Send to carrier dock"] + end_amz(["End"]) + end + end + + subgraph pool_carrier["Carrier"] + direction LR + start_carrier(["Pick items"]) + task_load_truck["Load Truck"] + task_deliver["Deliver Items"] + end_carrier(["End"]) + end + + subgraph pool_cc["Credit Card Company"] + direction LR + start_cc(["Receive Credit Card Information"]) + task_take_payment["Take Payment"] + end_cc(["Send Result"]) + end + + %% Customer — browse loop + start_cust_browse --> task_browse + task_browse --> task_add_cart + task_add_cart --> xgw_done_shopping + xgw_done_shopping -->|"No"| task_browse + xgw_done_shopping -->|"Yes"| sub_checkout + sub_checkout --> task_receive_items + task_receive_items --> end_cust_received + sub_checkout o--o boundary_checkout_err + boundary_checkout_err --> end_cust_err + + %% Customer — pay loop + start_cust_pay --> task_pay_order + task_pay_order --> xgw_payment_accepted + xgw_payment_accepted -->|"No"| xgw_retry + xgw_payment_accepted -->|"Yes"| ithrow_send_order + xgw_retry -->|"Yes"| task_pay_order + xgw_retry -->|"No"| end_cust_pay_fail + ithrow_send_order --> end_cust_send + + %% Amazon + start_amz_order --> task_pick + task_pick --> task_bin + task_bin --> task_package + task_package --> task_carrier_dock + task_carrier_dock --> end_amz + + %% Carrier + start_carrier --> task_load_truck + task_load_truck --> task_deliver + task_deliver --> end_carrier + + %% Credit Card + start_cc --> task_take_payment + task_take_payment --> end_cc + + %% Message flows + ithrow_send_order -.->|"Order"| start_amz_order + task_pay_order -.->|"Send Credit Card Information"| start_cc + end_cc -.->|"Result"| task_pay_order + task_carrier_dock -.->|"Shipment"| start_carrier + task_deliver -.->|"Delivery"| task_receive_items diff --git a/data/23_bpmn_3.JSON b/data/23_bpmn_3.JSON new file mode 100644 index 0000000..27eb0db --- /dev/null +++ b/data/23_bpmn_3.JSON @@ -0,0 +1,467 @@ +{ + "metadata": { + "id": "bpmn_3_23", + "source": "C.6.0.bpmn", + "diagram_type": "bpmn_process", + "tier": 3, + "entity_count": 32, + "container_count": 1, + "attachment_count": 8, + "description": "Single-pool Travel Booking process with compensation patterns: event-based gateway, parallel split/join, 6 send tasks, 6 service tasks (book/cancel hotel & flight, charge card, update record), intermediate catch events, 4 boundary events (2 compensation on the booking tasks, 2 error on charge-card and the Make Booking sub-process), and two compensate-throw events that compensate the Make Booking sub-process on payment or booking failure." + }, + "participants": [], + "lanes": [], + "nodes": [ + { + "id": "start_travel_request", + "name": "Receive Travel Request", + "type": "startEvent", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "start_blank", + "name": "", + "type": "startEvent", + "pool": null, + "lane": null, + "attached_to": null, + "parent_subprocess": "sub_make_booking" + }, + { + "id": "end_offer_expired", + "name": "Offer Expired", + "type": "endEvent", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "end_req_cancelled", + "name": "Request Cancelled", + "type": "endEvent", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "end_booking_confirmed", + "name": "Booking Confirmed", + "type": "endEvent", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "end_failed_credit", + "name": "Failed Credit Transaction", + "type": "endEvent", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "end_failed_booking", + "name": "Failed Booking", + "type": "endEvent", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "end_travel_booked", + "name": "Travel Booked", + "type": "endEvent", + "pool": null, + "lane": null, + "attached_to": null, + "parent_subprocess": "sub_make_booking" + }, + { + "id": "svc_book_hotel", + "name": "Book Hotel", + "type": "serviceTask", + "pool": null, + "lane": null, + "attached_to": null, + "parent_subprocess": "sub_make_booking" + }, + { + "id": "svc_book_flight", + "name": "Book Flight", + "type": "serviceTask", + "pool": null, + "lane": null, + "attached_to": null, + "parent_subprocess": "sub_make_booking" + }, + { + "id": "svc_cancel_hotel", + "name": "Cancel Hotel", + "type": "serviceTask", + "pool": null, + "lane": null, + "attached_to": null, + "parent_subprocess": "sub_make_booking" + }, + { + "id": "svc_cancel_flight", + "name": "Cancel Flight", + "type": "serviceTask", + "pool": null, + "lane": null, + "attached_to": null, + "parent_subprocess": "sub_make_booking" + }, + { + "id": "svc_charge_card", + "name": "Charge Credit Card", + "type": "serviceTask", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "svc_update_customer", + "name": "Update Customer Record", + "type": "serviceTask", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "send_cc_info", + "name": "Request Credit Card Information", + "type": "sendTask", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "send_confirm", + "name": "Confirm Booking", + "type": "sendTask", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "send_failed_booking", + "name": "Notify Failed Booking", + "type": "sendTask", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "send_offer", + "name": "Make Flights and Hotel Offer", + "type": "sendTask", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "send_offer_expired", + "name": "Notify Customer Offer Expired", + "type": "sendTask", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "send_failed_credit", + "name": "Notify Failed Credit Transaction", + "type": "sendTask", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "sub_make_booking", + "name": "Make Booking", + "type": "subProcess", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "pgw_3", + "name": "", + "type": "parallelGateway", + "pool": null, + "lane": null, + "attached_to": null, + "parent_subprocess": "sub_make_booking" + }, + { + "id": "pgw_4", + "name": "", + "type": "parallelGateway", + "pool": null, + "lane": null, + "attached_to": null, + "parent_subprocess": "sub_make_booking" + }, + { + "id": "ebgw_1", + "name": "", + "type": "eventBasedGateway", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "ithrow_booking_comp", + "name": "Compensate Bookings", + "type": "intermediateThrowEvent", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "ithrow_comp_booking", + "name": "Compensate Bookings", + "type": "intermediateThrowEvent", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "icatch_offer_approved", + "name": "Offer Approved", + "type": "intermediateCatchEvent", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "icatch_cancel", + "name": "Cancel Request", + "type": "intermediateCatchEvent", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "icatch_24h", + "name": "24 Hours", + "type": "intermediateCatchEvent", + "pool": null, + "lane": null, + "attached_to": null + }, + { + "id": "boundary_hotel_comp", + "name": "Hotel", + "type": "boundaryEvent", + "pool": null, + "lane": null, + "attached_to": "svc_book_hotel", + "parent_subprocess": "sub_make_booking" + }, + { + "id": "boundary_flight_comp", + "name": "Flight", + "type": "boundaryEvent", + "pool": null, + "lane": null, + "attached_to": "svc_book_flight", + "parent_subprocess": "sub_make_booking" + }, + { + "id": "boundary_card_err", + "name": "", + "type": "boundaryEvent", + "pool": null, + "lane": null, + "attached_to": "svc_charge_card" + }, + { + "id": "boundary_booking_err", + "name": "", + "type": "boundaryEvent", + "pool": null, + "lane": null, + "attached_to": "sub_make_booking" + } + ], + "sequence_flows": [ + { + "id": "sf1", + "source": "start_travel_request", + "target": "send_offer", + "name": "" + }, + { + "id": "sf2", + "source": "send_offer", + "target": "ebgw_1", + "name": "" + }, + { + "id": "sf3", + "source": "ebgw_1", + "target": "icatch_offer_approved", + "name": "" + }, + { + "id": "sf4", + "source": "ebgw_1", + "target": "icatch_cancel", + "name": "" + }, + { + "id": "sf5", + "source": "ebgw_1", + "target": "icatch_24h", + "name": "" + }, + { + "id": "sf6", + "source": "icatch_offer_approved", + "target": "send_cc_info", + "name": "" + }, + { + "id": "sf7", + "source": "send_cc_info", + "target": "sub_make_booking", + "name": "" + }, + { + "id": "sf8", + "source": "sub_make_booking", + "target": "svc_charge_card", + "name": "" + }, + { + "id": "sf9", + "source": "svc_charge_card", + "target": "send_confirm", + "name": "" + }, + { + "id": "sf10", + "source": "send_confirm", + "target": "end_booking_confirmed", + "name": "" + }, + { + "id": "sf11", + "source": "start_blank", + "target": "pgw_3", + "name": "" + }, + { + "id": "sf12", + "source": "pgw_3", + "target": "svc_book_hotel", + "name": "" + }, + { + "id": "sf13", + "source": "pgw_3", + "target": "svc_book_flight", + "name": "" + }, + { + "id": "sf14", + "source": "svc_book_hotel", + "target": "pgw_4", + "name": "" + }, + { + "id": "sf15", + "source": "svc_book_flight", + "target": "pgw_4", + "name": "" + }, + { + "id": "sf16", + "source": "pgw_4", + "target": "end_travel_booked", + "name": "" + }, + { + "id": "sf23", + "source": "boundary_card_err", + "target": "ithrow_booking_comp", + "name": "" + }, + { + "id": "sf24", + "source": "ithrow_booking_comp", + "target": "send_failed_credit", + "name": "" + }, + { + "id": "sf25", + "source": "send_failed_credit", + "target": "end_failed_credit", + "name": "" + }, + { + "id": "sf26", + "source": "boundary_booking_err", + "target": "ithrow_comp_booking", + "name": "" + }, + { + "id": "sf27", + "source": "send_failed_booking", + "target": "end_failed_booking", + "name": "" + }, + { + "id": "sf29", + "source": "icatch_cancel", + "target": "svc_update_customer", + "name": "" + }, + { + "id": "sf30", + "source": "svc_update_customer", + "target": "end_req_cancelled", + "name": "" + }, + { + "id": "sf31", + "source": "icatch_24h", + "target": "send_offer_expired", + "name": "" + }, + { + "id": "sf32", + "source": "send_offer_expired", + "target": "end_offer_expired", + "name": "" + }, + { + "id": "sf33", + "source": "ithrow_comp_booking", + "target": "send_failed_booking", + "name": "" + } + ], + "message_flows": [], + "compensation_associations": [ + { + "source": "boundary_hotel_comp", + "target": "svc_cancel_hotel" + }, + { + "source": "boundary_flight_comp", + "target": "svc_cancel_flight" + }, + { + "source": "ithrow_booking_comp", + "target": "sub_make_booking" + }, + { + "source": "ithrow_comp_booking", + "target": "sub_make_booking" + } + ] +} \ No newline at end of file diff --git a/data/23_bpmn_3_ground_truth.MMD b/data/23_bpmn_3_ground_truth.MMD new file mode 100644 index 0000000..1cfc4a3 --- /dev/null +++ b/data/23_bpmn_3_ground_truth.MMD @@ -0,0 +1,94 @@ +--- +config: + theme: default +--- +flowchart LR + %% Top-level nodes — not inside any sub-process + start_travel_request(["Receive Travel Request"]) + send_offer["Make Flights and Hotel Offer"] + ebgw_1{{"⊗ Event-Based Gateway"}} + icatch_offer_approved(("Offer Approved")) + icatch_cancel(("Cancel Request")) + icatch_24h(("24 Hours")) + send_cc_info["Request Credit Card Information"] + svc_charge_card["Charge Credit Card"] + boundary_card_err(("Error")) + boundary_booking_err(("Error")) + ithrow_booking_comp(("Compensate Bookings ⟲")) + ithrow_comp_booking(("Compensate Bookings ⟲")) + send_confirm["Confirm Booking"] + send_failed_credit["Notify Failed Credit Transaction"] + send_failed_booking["Notify Failed Booking"] + svc_update_customer["Update Customer Record"] + send_offer_expired["Notify Customer Offer Expired"] + end_booking_confirmed(["Booking Confirmed"]) + end_failed_credit(["Failed Credit Transaction"]) + end_failed_booking(["Failed Booking"]) + end_req_cancelled(["Request Cancelled"]) + end_offer_expired(["Offer Expired"]) + + subgraph sub_make_booking["Make Booking"] + direction LR + start_blank(["1"]) + pgw_3{{"+ Parallel Split"}} + svc_book_hotel["Book Hotel"] + boundary_hotel_comp(("Hotel ⟲")) + svc_cancel_hotel["Cancel Hotel ⟲"] + svc_book_flight["Book Flight"] + boundary_flight_comp(("Flight ⟲")) + svc_cancel_flight["Cancel Flight ⟲"] + pgw_4{{"+ Parallel Join"}} + end_travel_booked(["Travel Booked"]) + end + + %% Main path + start_travel_request --> send_offer + send_offer --> ebgw_1 + ebgw_1 --> icatch_offer_approved + ebgw_1 --> icatch_cancel + ebgw_1 --> icatch_24h + + %% Offer approved → booking → charge → confirm + icatch_offer_approved --> send_cc_info + send_cc_info --> sub_make_booking + sub_make_booking --> svc_charge_card + svc_charge_card --> send_confirm + send_confirm --> end_booking_confirmed + + %% Make Booking internals + start_blank --> pgw_3 + pgw_3 --> svc_book_hotel + pgw_3 --> svc_book_flight + svc_book_hotel --> pgw_4 + svc_book_flight --> pgw_4 + pgw_4 --> end_travel_booked + + %% Boundary event attachments via o--o: host activity to boundary event + svc_book_hotel o--o boundary_hotel_comp + svc_book_flight o--o boundary_flight_comp + svc_charge_card o--o boundary_card_err + sub_make_booking o--o boundary_booking_err + + %% Compensation handlers via o--o: boundary events to cancel tasks isForCompensation + boundary_hotel_comp o--o svc_cancel_hotel + boundary_flight_comp o--o svc_cancel_flight + + %% Payment failure: charge card error → compensate the bookings → notify failed credit + boundary_card_err --> ithrow_booking_comp + ithrow_booking_comp o--o sub_make_booking + ithrow_booking_comp --> send_failed_credit + send_failed_credit --> end_failed_credit + + %% Booking failure: Make Booking error → compensate the bookings → notify failed booking + boundary_booking_err --> ithrow_comp_booking + ithrow_comp_booking o--o sub_make_booking + ithrow_comp_booking --> send_failed_booking + send_failed_booking --> end_failed_booking + + %% Cancel path + icatch_cancel --> svc_update_customer + svc_update_customer --> end_req_cancelled + + %% 24h expiry + icatch_24h --> send_offer_expired + send_offer_expired --> end_offer_expired diff --git a/data/24_bpmn_3.JSON b/data/24_bpmn_3.JSON new file mode 100644 index 0000000..36bbf54 --- /dev/null +++ b/data/24_bpmn_3.JSON @@ -0,0 +1,329 @@ +{ + "metadata": { + "id": "bpmn_3_24", + "source": "C.9.2.bpmn (expanded)", + "diagram_type": "bpmn_collaboration", + "tier": 3, + "entity_count": 23, + "container_count": 1, + "attachment_count": 1, + "description": "Manual Check process (expanded from C.9.2). Adds a parallel split for concurrent fraud and risk checks, an escalation gateway routing detected fraud to a Senior Reviewer, and an intermediate message catch event awaiting additional documents. Separate start events drive the document-request, accelerated-decision, and fraud-evaluation flows." + }, + "participants": [ + { + "id": "pool_manual_check", + "name": "Manual Check", + "lanes": [] + } + ], + "lanes": [], + "nodes": [ + { + "id": "start_decide_manually", + "name": "Decide Manually", + "type": "startEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "start_doc_requested", + "name": "Document requested", + "type": "startEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "start_accelerated", + "name": "Accelerated decision", + "type": "startEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "start_fraud_suspected", + "name": "Fraud suspected", + "type": "startEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "end_doc_received", + "name": "Document Received", + "type": "endEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "end_decision_accelerated", + "name": "Decision accelerated", + "type": "endEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "end_fraud_detected", + "name": "Fraud Detected", + "type": "endEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "end_fraud_not_detected", + "name": "Fraud not detected", + "type": "endEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "end_timeout", + "name": "Timeout", + "type": "endEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "end_manually_decided", + "name": "Manually Decided", + "type": "endEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "user_decide_on_app", + "name": "Decide on application", + "type": "userTask", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "user_accelerate", + "name": "Accelerate decision making", + "type": "userTask", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "user_check_fraud", + "name": "Check for Fraud", + "type": "userTask", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "send_notify_delay", + "name": "Notify customer about delay", + "type": "sendTask", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "call_doc_request", + "name": "Document Request", + "type": "callActivity", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "boundary_timeout_7d", + "name": "Timeout (7 days)", + "type": "boundaryEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": "user_decide_on_app" + }, + { + "id": "xgw_fraud_detected", + "name": "Fraud detected?", + "type": "exclusiveGateway", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "pgw_split_fraud_risk", + "name": "Split: Fraud + Risk", + "type": "parallelGateway", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "user_assess_risk", + "name": "Assess Risk Level", + "type": "userTask", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "pgw_join_fraud_risk", + "name": "Join: Checks Complete", + "type": "parallelGateway", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "xgw_escalate", + "name": "Escalate to Senior?", + "type": "exclusiveGateway", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "user_senior_review", + "name": "Senior Reviewer Check", + "type": "userTask", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + }, + { + "id": "icatch_additional_info", + "name": "Additional Info Received", + "type": "intermediateCatchEvent", + "pool": "pool_manual_check", + "lane": null, + "attached_to": null + } + ], + "sequence_flows": [ + { + "id": "sf1", + "source": "start_decide_manually", + "target": "user_decide_on_app", + "name": "" + }, + { + "id": "sf2", + "source": "user_decide_on_app", + "target": "end_manually_decided", + "name": "" + }, + { + "id": "sf3", + "source": "boundary_timeout_7d", + "target": "end_timeout", + "name": "" + }, + { + "id": "sf4", + "source": "start_doc_requested", + "target": "call_doc_request", + "name": "" + }, + { + "id": "sf5", + "source": "call_doc_request", + "target": "icatch_additional_info", + "name": "" + }, + { + "id": "sf6", + "source": "icatch_additional_info", + "target": "end_doc_received", + "name": "" + }, + { + "id": "sf7", + "source": "start_accelerated", + "target": "send_notify_delay", + "name": "" + }, + { + "id": "sf8", + "source": "send_notify_delay", + "target": "user_accelerate", + "name": "" + }, + { + "id": "sf9", + "source": "user_accelerate", + "target": "end_decision_accelerated", + "name": "" + }, + { + "id": "sf10", + "source": "start_fraud_suspected", + "target": "pgw_split_fraud_risk", + "name": "" + }, + { + "id": "sf11", + "source": "pgw_split_fraud_risk", + "target": "user_check_fraud", + "name": "" + }, + { + "id": "sf12", + "source": "pgw_split_fraud_risk", + "target": "user_assess_risk", + "name": "" + }, + { + "id": "sf13", + "source": "user_check_fraud", + "target": "pgw_join_fraud_risk", + "name": "" + }, + { + "id": "sf14", + "source": "user_assess_risk", + "target": "pgw_join_fraud_risk", + "name": "" + }, + { + "id": "sf15", + "source": "pgw_join_fraud_risk", + "target": "xgw_fraud_detected", + "name": "" + }, + { + "id": "sf16", + "source": "xgw_fraud_detected", + "target": "end_fraud_not_detected", + "name": "No" + }, + { + "id": "sf17", + "source": "xgw_fraud_detected", + "target": "xgw_escalate", + "name": "Yes" + }, + { + "id": "sf18", + "source": "xgw_escalate", + "target": "user_senior_review", + "name": "Escalate" + }, + { + "id": "sf19", + "source": "xgw_escalate", + "target": "end_fraud_detected", + "name": "Confirmed" + }, + { + "id": "sf20", + "source": "user_senior_review", + "target": "end_fraud_not_detected", + "name": "Cleared" + } + ], + "message_flows": [] +} \ No newline at end of file diff --git a/data/24_bpmn_3_ground_truth.MMD b/data/24_bpmn_3_ground_truth.MMD new file mode 100644 index 0000000..b928672 --- /dev/null +++ b/data/24_bpmn_3_ground_truth.MMD @@ -0,0 +1,72 @@ +--- +config: + theme: default +--- +flowchart LR + subgraph pool_manual_check["Manual Check"] + direction LR + + %% Manual decision path + start_decide_manually(["Decide Manually"]) + user_decide_on_app["Decide on application"] + boundary_timeout_7d(("Timeout (7 days) ⏱")) + end_manually_decided(["Manually Decided"]) + end_timeout(["Timeout"]) + + %% Document path + start_doc_requested(["Document requested ✉"]) + call_doc_request[["Document Request"]] + icatch_additional_info(("Additional Info Received ✉")) + end_doc_received(["Document Received"]) + + %% Accelerated path + start_accelerated(["Accelerated decision ⏱"]) + send_notify_delay["Notify customer about delay"] + user_accelerate["Accelerate decision making"] + end_decision_accelerated(["Decision accelerated"]) + + %% Fraud path — expanded with parallel split + escalation + start_fraud_suspected(["Fraud suspected ✉"]) + pgw_split_fraud_risk{{"+ Split: Fraud + Risk"}} + user_check_fraud["Check for Fraud"] + user_assess_risk["Assess Risk Level"] + xgw_fraud_detected{"Fraud detected?"} + pgw_join_fraud_risk{{"+ Join: Checks Complete"}} + end_fraud_not_detected(["Fraud not detected"]) + xgw_escalate{"Escalate to Senior?"} + user_senior_review["Senior Reviewer Check"] + end_fraud_detected(["Fraud Detected"]) + end + + %% Manual decision + start_decide_manually --> user_decide_on_app + user_decide_on_app --> end_manually_decided + user_decide_on_app o--o boundary_timeout_7d + boundary_timeout_7d --> end_timeout + + %% Document path (extended) + start_doc_requested --> call_doc_request + call_doc_request --> icatch_additional_info + icatch_additional_info --> end_doc_received + + %% Accelerated path + start_accelerated --> send_notify_delay + send_notify_delay --> user_accelerate + user_accelerate --> end_decision_accelerated + + %% Fraud path — parallel split + start_fraud_suspected --> pgw_split_fraud_risk + pgw_split_fraud_risk --> user_check_fraud + pgw_split_fraud_risk --> user_assess_risk + user_check_fraud --> pgw_join_fraud_risk + user_assess_risk --> pgw_join_fraud_risk + + %% Fraud gateway routing — both checks join before the decision + pgw_join_fraud_risk --> xgw_fraud_detected + xgw_fraud_detected -->|"No"| end_fraud_not_detected + + %% Escalation path + xgw_fraud_detected -->|"Yes"| xgw_escalate + xgw_escalate -->|"Escalate"| user_senior_review + xgw_escalate -->|"Confirmed"| end_fraud_detected + user_senior_review -->|"Cleared"| end_fraud_not_detected diff --git a/data/25_bpmn_3.JSON b/data/25_bpmn_3.JSON new file mode 100644 index 0000000..e74985f --- /dev/null +++ b/data/25_bpmn_3.JSON @@ -0,0 +1,391 @@ +{ + "metadata": { + "id": "bpmn_3_25", + "source": "C.8.1.bpmn (expanded)", + "diagram_type": "bpmn_collaboration", + "tier": 3, + "entity_count": 27, + "container_count": 1, + "attachment_count": 2, + "description": "Vacation Request process (expanded from C.8.1). Adds a balance check (serviceTask + exclusiveGateway) before the business-rule engine, an HR committee review branch (exclusiveGateway + userTask + intermediate message catch event) for complex requests with a notification send task and interrupting timer boundary escalation, and Insufficient Balance / HR Timeout end events." + }, + "participants": [ + { + "id": "pool_vacation", + "name": "Vacation Request", + "lanes": [] + } + ], + "lanes": [], + "nodes": [ + { + "id": "start_req", + "name": "Vacation Request Received", + "type": "startEvent", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "end_refused_auto", + "name": "Vacation Refused Automatically", + "type": "endEvent", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "end_approved_auto", + "name": "Vacation Approved Automatically", + "type": "endEvent", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "end_approved_manager", + "name": "Vacation Approved by Manager", + "type": "endEvent", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "end_refused_manager", + "name": "Vacation Refused by Manager", + "type": "endEvent", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "end_employee_not_found", + "name": "Employee not found", + "type": "endEvent", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "end_insufficient_balance", + "name": "Insufficient Vacation Balance", + "type": "endEvent", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "end_hr_timeout", + "name": "HR Review Timeout", + "type": "endEvent", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "svc_fetch_info", + "name": "Fetch Vacation Information", + "type": "serviceTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "svc_update_remaining_auto", + "name": "Update Remaining Vacation", + "type": "serviceTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "svc_update_remaining_mgr", + "name": "Update Remaining Vacation", + "type": "serviceTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "svc_check_balance", + "name": "Check Remaining Days", + "type": "serviceTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "brt_approval", + "name": "Vacation Approval", + "type": "businessRuleTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "user_manual_approve", + "name": "Manually Approve Vacation", + "type": "userTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "user_hr_committee", + "name": "HR Committee Review", + "type": "userTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "send_notify_refusal_auto", + "name": "Notify Employee of Refusal", + "type": "sendTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "send_notify_approval_auto", + "name": "Notify Employee of Approval", + "type": "sendTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "send_notify_approval_mgr", + "name": "Notify Employee of Approval", + "type": "sendTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "send_notify_refusal_mgr", + "name": "Notify Employee of Refusal", + "type": "sendTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "send_notify_hr_committee_input", + "name": "Notify HR Committee for Input", + "type": "sendTask", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "xgw_approval_result", + "name": "", + "type": "exclusiveGateway", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "xgw_manual_result", + "name": "", + "type": "exclusiveGateway", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "xgw_sufficient_balance", + "name": "Sufficient Balance?", + "type": "exclusiveGateway", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "xgw_needs_hr_review", + "name": "Needs HR Review?", + "type": "exclusiveGateway", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "boundary_fetch_err", + "name": "", + "type": "boundaryEvent", + "pool": "pool_vacation", + "lane": null, + "attached_to": "svc_fetch_info" + }, + { + "id": "icatch_hr_decision", + "name": "HR Decision Received", + "type": "intermediateCatchEvent", + "pool": "pool_vacation", + "lane": null, + "attached_to": null + }, + { + "id": "boundary_hr_timer", + "name": "HR Review Deadline", + "type": "boundaryEvent", + "pool": "pool_vacation", + "lane": null, + "attached_to": "user_hr_committee" + } + ], + "sequence_flows": [ + { + "id": "sf1", + "source": "start_req", + "target": "svc_fetch_info", + "name": "" + }, + { + "id": "sf2", + "source": "svc_fetch_info", + "target": "svc_check_balance", + "name": "" + }, + { + "id": "sf3", + "source": "svc_check_balance", + "target": "xgw_sufficient_balance", + "name": "" + }, + { + "id": "sf4", + "source": "xgw_sufficient_balance", + "target": "brt_approval", + "name": "Yes" + }, + { + "id": "sf5", + "source": "xgw_sufficient_balance", + "target": "end_insufficient_balance", + "name": "No" + }, + { + "id": "sf6", + "source": "boundary_fetch_err", + "target": "end_employee_not_found", + "name": "" + }, + { + "id": "sf7", + "source": "brt_approval", + "target": "xgw_approval_result", + "name": "" + }, + { + "id": "sf8", + "source": "xgw_approval_result", + "target": "send_notify_refusal_auto", + "name": "Refused" + }, + { + "id": "sf9", + "source": "send_notify_refusal_auto", + "target": "end_refused_auto", + "name": "" + }, + { + "id": "sf10", + "source": "xgw_approval_result", + "target": "send_notify_approval_auto", + "name": "Approved" + }, + { + "id": "sf11", + "source": "send_notify_approval_auto", + "target": "svc_update_remaining_auto", + "name": "" + }, + { + "id": "sf12", + "source": "svc_update_remaining_auto", + "target": "end_approved_auto", + "name": "" + }, + { + "id": "sf13", + "source": "xgw_approval_result", + "target": "xgw_needs_hr_review", + "name": "Manual Validation Required" + }, + { + "id": "sf14", + "source": "xgw_needs_hr_review", + "target": "user_manual_approve", + "name": "No" + }, + { + "id": "sf15", + "source": "xgw_needs_hr_review", + "target": "send_notify_hr_committee_input", + "name": "Yes" + }, + { + "id": "sf15a", + "source": "send_notify_hr_committee_input", + "target": "user_hr_committee", + "name": "" + }, + { + "id": "sf16", + "source": "user_hr_committee", + "target": "icatch_hr_decision", + "name": "" + }, + { + "id": "sf16a", + "source": "boundary_hr_timer", + "target": "end_hr_timeout", + "name": "" + }, + { + "id": "sf17", + "source": "icatch_hr_decision", + "target": "xgw_manual_result", + "name": "" + }, + { + "id": "sf18", + "source": "user_manual_approve", + "target": "xgw_manual_result", + "name": "" + }, + { + "id": "sf19", + "source": "xgw_manual_result", + "target": "send_notify_approval_mgr", + "name": "Approved" + }, + { + "id": "sf20", + "source": "xgw_manual_result", + "target": "send_notify_refusal_mgr", + "name": "Refused" + }, + { + "id": "sf21", + "source": "send_notify_approval_mgr", + "target": "svc_update_remaining_mgr", + "name": "" + }, + { + "id": "sf22", + "source": "svc_update_remaining_mgr", + "target": "end_approved_manager", + "name": "" + }, + { + "id": "sf23", + "source": "send_notify_refusal_mgr", + "target": "end_refused_manager", + "name": "" + } + ], + "message_flows": [] +} \ No newline at end of file diff --git a/data/25_bpmn_3_ground_truth.MMD b/data/25_bpmn_3_ground_truth.MMD new file mode 100644 index 0000000..b99f3bd --- /dev/null +++ b/data/25_bpmn_3_ground_truth.MMD @@ -0,0 +1,83 @@ +--- +config: + theme: default +--- +flowchart LR + subgraph pool_vacation["Vacation Request"] + direction LR + + start_req(["Vacation Request Received"]) + + %% Balance check (new) + svc_fetch_info["Fetch Vacation Information"] + boundary_fetch_err(("Error")) + end_employee_not_found(["Employee not found"]) + svc_check_balance["Check Remaining Days"] + xgw_sufficient_balance{"Sufficient Balance?"} + end_insufficient_balance(["Insufficient Vacation Balance"]) + + %% Auto decision + brt_approval["Vacation Approval"] + xgw_approval_result{"Vacation Approval"} + + %% Auto paths + send_notify_refusal_auto["Notify Employee of Refusal"] + end_refused_auto(["Vacation Refused Automatically"]) + send_notify_approval_auto["Notify Employee of Approval"] + svc_update_remaining_auto["Update Remaining Vacation"] + end_approved_auto(["Vacation Approved Automatically"]) + + %% HR review routing (new) + xgw_needs_hr_review{"Needs HR Review?"} + send_notify_hr_committee_input["Notify HR Committee for Input"] + user_hr_committee["HR Committee Review"] + icatch_hr_decision(("HR Decision Received ✉")) + boundary_hr_timer(("HR Review Deadline")) + end_hr_timeout(["HR Review Timeout"]) + + %% Manual approval + user_manual_approve["Manually Approve Vacation"] + xgw_manual_result{"Vacation Approved"} + + %% Manager paths + send_notify_approval_mgr["Notify Employee of Approval"] + svc_update_remaining_mgr["Update Remaining Vacation"] + end_approved_manager(["Vacation Approved by Manager"]) + send_notify_refusal_mgr["Notify Employee of Refusal"] + end_refused_manager(["Vacation Refused by Manager"]) + end + + %% Main flow + start_req --> svc_fetch_info + svc_fetch_info --> svc_check_balance + svc_fetch_info o--o boundary_fetch_err + boundary_fetch_err --> end_employee_not_found + svc_check_balance --> xgw_sufficient_balance + xgw_sufficient_balance -->|"Yes"| brt_approval + xgw_sufficient_balance -->|"No"| end_insufficient_balance + + %% Auto decision routing + brt_approval --> xgw_approval_result + xgw_approval_result -->|"Refused"| send_notify_refusal_auto + send_notify_refusal_auto --> end_refused_auto + xgw_approval_result -->|"Approved"| send_notify_approval_auto + send_notify_approval_auto --> svc_update_remaining_auto + svc_update_remaining_auto --> end_approved_auto + + %% HR review routing + xgw_approval_result -->|"Manual Validation Required"| xgw_needs_hr_review + xgw_needs_hr_review -->|"No"| user_manual_approve + xgw_needs_hr_review -->|"Yes"| send_notify_hr_committee_input + send_notify_hr_committee_input --> user_hr_committee + user_hr_committee --> icatch_hr_decision + icatch_hr_decision --> xgw_manual_result + user_hr_committee o--o boundary_hr_timer + boundary_hr_timer --> end_hr_timeout + user_manual_approve --> xgw_manual_result + + %% Manual result + xgw_manual_result -->|"Approved"| send_notify_approval_mgr + send_notify_approval_mgr --> svc_update_remaining_mgr + svc_update_remaining_mgr --> end_approved_manager + xgw_manual_result -->|"Refused"| send_notify_refusal_mgr + send_notify_refusal_mgr --> end_refused_manager diff --git a/data/26_it_3.JSON b/data/26_it_3.JSON new file mode 100644 index 0000000..9837a9f --- /dev/null +++ b/data/26_it_3.JSON @@ -0,0 +1,576 @@ +{ + "metadata": { + "id": "it_3_26", + "diagram_type": "network_topology", + "tier": 3, + "entity_count": 28, + "container_count": 11, + "attachment_count": 0, + "description": "Dual data center with active/standby load balancing and three-zone defense-in-depth architecture per DC: outer DMZ, Auth Zone (auth gateway + IAM + token cache), and App Zone (inner firewall + web app + LDAP + DB + internal logging server). A CDN edge fronts the global load balancer for user traffic. Each DC has its own Monitoring Zone with a local SIEM that ingests from its DC logging server; the two SIEMs cross-replicate to keep monitoring self-sufficient on failover. Users pass geofencing (zone + IP + telecom MFA) and IAM auth in the Auth Zone before the inner firewall admits them to the App Zone. Admin/audit users authenticate separately through the Auth Zone to access internal logging servers directly \u2014 no load balancer involved for log access. DB and app components write logs directly to the logging server. Log data replicates between DCs." + }, + "system_boundary": { + "id": "enterprise_net", + "name": "Enterprise Network", + "type": "deployment_environment" + }, + "elements": [ + { + "id": "user_clients", + "name": "User Clients", + "type": "person", + "description": "End users accessing the web application via browser; receive SMS MFA codes on their mobile phones" + }, + { + "id": "global_lb", + "name": "Global Load Balancer", + "type": "external_system", + "description": "DNS-based / anycast global traffic manager. Routes requests to DC1 (active) and DC2 (failover)" + }, + { + "id": "geofencing_svc", + "name": "Geofencing Service", + "type": "external_system", + "description": "External SaaS that enforces geographic access policies. Validates user zone via IP geolocation and cell-tower location data provided by the telecom provider, then coordinates the MFA challenge" + }, + { + "id": "telecom_provider", + "name": "Telecom Provider", + "type": "external_system", + "description": "Mobile network operator. Supplies cell-tower location data to the geofencing service and delivers SMS MFA codes to user mobile devices" + }, + { + "id": "cdn_edge", + "name": "CDN Edge", + "type": "external_system", + "description": "Content delivery network edge cache. Terminates user TLS, serves cached static assets, and forwards dynamic requests to the global load balancer" + }, + { + "id": "dc1", + "name": "Data Center 1 \u2014 Primary", + "type": "deployment_environment", + "boundary": "enterprise_net", + "description": "Primary data center. Handles live traffic across three network zones: DMZ, Auth Zone, and App Zone" + }, + { + "id": "dmz1", + "name": "DMZ", + "type": "network_boundary", + "boundary": "dc1", + "description": "Outer perimeter zone in DC1. Terminates inbound TLS from the global load balancer and forwards to the Auth Zone; no direct path to the App Zone" + }, + { + "id": "outer_fw1", + "name": "Outer Firewall DC1", + "type": "device", + "technology": "Stateful Firewall", + "boundary": "dmz1", + "description": "Outer stateful firewall. Permits only HTTPS inbound and blocks all direct traffic to the App Zone" + }, + { + "id": "lb1", + "name": "Load Balancer DC1", + "type": "device", + "technology": "Application Load Balancer", + "boundary": "dmz1", + "description": "Application load balancer in the DC1 DMZ. Routes HTTPS traffic to the Auth Zone gateway" + }, + { + "id": "auth_zone1", + "name": "Auth Zone", + "type": "network_boundary", + "boundary": "dc1", + "description": "Authentication and policy-enforcement zone in DC1. All sessions must pass geofencing and IAM checks here before a token is issued and the inner firewall is opened" + }, + { + "id": "auth_gw1", + "name": "Auth Gateway DC1", + "type": "server", + "technology": "Policy Enforcement Point / Reverse Proxy", + "boundary": "auth_zone1", + "description": "Auth gateway acting as the policy enforcement point. Receives sessions from the LB, delegates auth to the IAM server and geofencing check, and forwards validated sessions through the inner firewall to the App Zone" + }, + { + "id": "iam_server", + "name": "IAM Server", + "type": "server", + "technology": "Identity and Access Management (IdP / Auth Server)", + "boundary": "auth_zone1", + "description": "Primary IAM server (e.g., Keycloak). Issues and validates JWT tokens, enforces RBAC policies, and calls the geofencing service for zone and MFA verification. Writes issued tokens to the token store" + }, + { + "id": "token_store", + "name": "Token Store", + "type": "server", + "technology": "Redis (Session / Token Cache)", + "boundary": "auth_zone1", + "description": "In-memory token and session cache in the Auth Zone. Enables fast token validation by the auth gateway without hitting the IAM server on every request" + }, + { + "id": "app_zone1", + "name": "App Zone", + "type": "network_boundary", + "boundary": "dc1", + "description": "Inner application zone in DC1. Only reachable through the inner firewall after a valid token has been issued in the Auth Zone" + }, + { + "id": "inner_fw1", + "name": "Inner Firewall DC1", + "type": "device", + "technology": "Stateful Firewall", + "boundary": "app_zone1", + "description": "Inner stateful firewall. Permits only traffic carrying a valid session token from the Auth Zone; blocks all unauthenticated paths" + }, + { + "id": "web_app_1", + "name": "Web App 1", + "type": "server", + "technology": "Web Application Server", + "boundary": "app_zone1", + "description": "Primary web application server. Processes authenticated requests from the inner firewall, queries LDAP for user attributes, and reads/writes the primary database" + }, + { + "id": "ldap_directory", + "name": "LDAP Directory", + "type": "server", + "technology": "LDAP / Active Directory", + "boundary": "app_zone1", + "description": "Authoritative user directory. Stores identities, group memberships, and access policies. Queried by the web app for authorisation decisions after the auth check" + }, + { + "id": "db_primary", + "name": "PostgreSQL Primary", + "type": "server", + "technology": "PostgreSQL (Read / Write)", + "boundary": "app_zone1", + "description": "Primary PostgreSQL database. Handles all read and write operations; streams WAL replication to DC2; ships structured access and edit audit events to the logging service" + }, + { + "id": "dc2", + "name": "Data Center 2 \u2014 Standby", + "type": "deployment_environment", + "boundary": "enterprise_net", + "description": "Standby data center. Receives continuous replication from DC1 across all three zones; promoted to active on DC1 failure" + }, + { + "id": "dmz2", + "name": "DMZ", + "type": "network_boundary", + "boundary": "dc2", + "description": "Outer perimeter zone in DC2. Ready to accept failover traffic" + }, + { + "id": "outer_fw2", + "name": "Outer Firewall DC2", + "type": "device", + "technology": "Stateful Firewall", + "boundary": "dmz2", + "description": "Outer stateful firewall in DC2 DMZ" + }, + { + "id": "lb2", + "name": "Load Balancer DC2", + "type": "device", + "technology": "Application Load Balancer", + "boundary": "dmz2", + "description": "Application load balancer in DC2 DMZ. Activated on failover" + }, + { + "id": "auth_zone2", + "name": "Auth Zone", + "type": "network_boundary", + "boundary": "dc2", + "description": "Authentication and policy-enforcement zone in DC2. Mirrors DC1 Auth Zone; uses replicated IAM and token store for standalone operation on failover" + }, + { + "id": "auth_gw2", + "name": "Auth Gateway DC2", + "type": "server", + "technology": "Policy Enforcement Point / Reverse Proxy", + "boundary": "auth_zone2", + "description": "Auth gateway in DC2 Auth Zone. Delegates authentication to the replicated IAM stack and forwards validated sessions through the inner firewall to the App Zone during failover" + }, + { + "id": "iam_replica", + "name": "IAM Replica", + "type": "server", + "technology": "Identity and Access Management (Replica)", + "boundary": "auth_zone2", + "description": "Replicated IAM server in DC2 Auth Zone. Stays in sync with the DC1 primary; provides full auth capability locally on failover" + }, + { + "id": "token_store_replica", + "name": "Token Store Replica", + "type": "server", + "technology": "Redis (Replica)", + "boundary": "auth_zone2", + "description": "Replica of the DC1 token cache. Ensures active sessions remain valid when traffic fails over to DC2" + }, + { + "id": "app_zone2", + "name": "App Zone", + "type": "network_boundary", + "boundary": "dc2", + "description": "Inner application zone in DC2. Only reachable after successful auth in the DC2 Auth Zone" + }, + { + "id": "inner_fw2", + "name": "Inner Firewall DC2", + "type": "device", + "technology": "Stateful Firewall", + "boundary": "app_zone2", + "description": "Inner stateful firewall in DC2 App Zone" + }, + { + "id": "web_app_2", + "name": "Web App 2", + "type": "server", + "technology": "Web Application Server", + "boundary": "app_zone2", + "description": "Standby web application server. Serves traffic when DC2 is promoted to active" + }, + { + "id": "ldap_replica", + "name": "LDAP Replica", + "type": "server", + "technology": "LDAP Read Replica", + "boundary": "app_zone2", + "description": "Read replica of the LDAP directory. Keeps DC2 self-sufficient for authorisation lookups on failover" + }, + { + "id": "db_replica", + "name": "PostgreSQL Replica", + "type": "server", + "technology": "PostgreSQL (Standby / Read)", + "boundary": "app_zone2", + "description": "Standby PostgreSQL replica. Receives streaming replication from DC1 primary; promoted to read-write on failover; ships read-access audit events to the logging service" + }, + { + "id": "admin_clients", + "name": "Admin / Audit Users", + "type": "person", + "description": "Administrators and auditors. Access the internal logging servers via a dedicated admin login through the Auth Zone \u2014 not through the user load balancer" + }, + { + "id": "logging_server_1", + "name": "Logging Server DC1", + "type": "server", + "technology": "Centralised Log Store", + "boundary": "app_zone1", + "description": "Internal centralised logging server in DC1 App Zone. Receives DB access and edit events from db_primary and app event logs from web_app_1. Accessible only to authenticated admin users via the Auth Zone inner firewall. Replicates log data to DC2." + }, + { + "id": "logging_server_2", + "name": "Logging Server DC2", + "type": "server", + "technology": "Centralised Log Store (Replica)", + "boundary": "app_zone2", + "description": "Internal centralised logging server in DC2 App Zone. Receives DB read events from db_replica and app event logs from web_app_2. Accessible to admins on failover. Receives replicated log data from DC1." + }, + { + "id": "monitoring_zone1", + "name": "Monitoring Zone DC1", + "type": "network_boundary", + "boundary": "dc1", + "description": "DC1 monitoring zone. Hosts the local SIEM that ingests logs from logging_server_1 and replicates state to DC2" + }, + { + "id": "siem_1", + "name": "SIEM DC1", + "type": "server", + "technology": "Security Information and Event Management", + "boundary": "monitoring_zone1", + "description": "Local SIEM in DC1. Pulls logs from logging_server_1, performs correlation and threat detection, and replicates state to the DC2 SIEM for failover continuity" + }, + { + "id": "monitoring_zone2", + "name": "Monitoring Zone DC2", + "type": "network_boundary", + "boundary": "dc2", + "description": "DC2 monitoring zone. Hosts the local SIEM that ingests logs from logging_server_2 and receives replicated state from DC1" + }, + { + "id": "siem_2", + "name": "SIEM DC2", + "type": "server", + "technology": "Security Information and Event Management (Replica)", + "boundary": "monitoring_zone2", + "description": "Local SIEM in DC2. Pulls logs from logging_server_2, receives replicated detection state from siem_1, and takes over correlation duties on failover" + } + ], + "relationships": [ + { + "id": "r1", + "source": "user_clients", + "target": "cdn_edge", + "label": "HTTPS", + "technology": "HTTPS" + }, + { + "id": "r1a", + "source": "cdn_edge", + "target": "global_lb", + "label": "Cache miss / dynamic", + "technology": "HTTPS" + }, + { + "id": "r2", + "source": "global_lb", + "target": "outer_fw1", + "label": "Active route", + "technology": "HTTPS" + }, + { + "id": "r3", + "source": "global_lb", + "target": "outer_fw2", + "label": "Failover route", + "technology": "HTTPS", + "note": "Dashed \u2014 standby path" + }, + { + "id": "r4", + "source": "outer_fw1", + "target": "lb1", + "label": "Filtered HTTPS" + }, + { + "id": "r5", + "source": "lb1", + "target": "auth_gw1", + "label": "HTTPS" + }, + { + "id": "r6", + "source": "auth_gw1", + "target": "iam_server", + "label": "Auth check", + "technology": "OAuth2 / OIDC" + }, + { + "id": "r7", + "source": "iam_server", + "target": "token_store", + "label": "Issue / validate token", + "technology": "Redis" + }, + { + "id": "r8", + "source": "iam_server", + "target": "geofencing_svc", + "label": "Zone + IP check", + "technology": "HTTPS / API" + }, + { + "id": "r9", + "source": "geofencing_svc", + "target": "telecom_provider", + "label": "Cell location + MFA trigger", + "technology": "Telecom API" + }, + { + "id": "r10", + "source": "telecom_provider", + "target": "user_clients", + "label": "SMS MFA code", + "technology": "SMS" + }, + { + "id": "r11", + "source": "auth_gw1", + "target": "inner_fw1", + "label": "Authenticated session", + "note": "Only after geofencing + IAM pass" + }, + { + "id": "r12", + "source": "inner_fw1", + "target": "web_app_1", + "label": "Filtered HTTP" + }, + { + "id": "r13", + "source": "web_app_1", + "target": "ldap_directory", + "label": "Authorisation lookup", + "technology": "LDAP" + }, + { + "id": "r14", + "source": "web_app_1", + "target": "db_primary", + "label": "Read / Write", + "technology": "SQL" + }, + { + "id": "r16", + "source": "db_primary", + "target": "db_replica", + "label": "Streaming replication", + "technology": "PostgreSQL WAL" + }, + { + "id": "r17", + "source": "iam_server", + "target": "iam_replica", + "label": "Sync", + "technology": "Internal replication" + }, + { + "id": "r18", + "source": "token_store", + "target": "token_store_replica", + "label": "Redis replication" + }, + { + "id": "r19", + "source": "ldap_directory", + "target": "ldap_replica", + "label": "LDAP replication" + }, + { + "id": "r20", + "source": "outer_fw1", + "target": "outer_fw2", + "label": "Encrypted WAN", + "technology": "IPsec", + "note": "Bidirectional inter-DC link" + }, + { + "id": "r21", + "source": "outer_fw2", + "target": "lb2", + "label": "Filtered HTTPS" + }, + { + "id": "r22", + "source": "lb2", + "target": "auth_gw2", + "label": "HTTPS" + }, + { + "id": "r23", + "source": "auth_gw2", + "target": "iam_replica", + "label": "Auth check", + "technology": "OAuth2 / OIDC" + }, + { + "id": "r24", + "source": "iam_replica", + "target": "token_store_replica", + "label": "Issue / validate token", + "technology": "Redis" + }, + { + "id": "r25", + "source": "auth_gw2", + "target": "inner_fw2", + "label": "Authenticated session", + "note": "Only after geofencing + IAM pass" + }, + { + "id": "r26", + "source": "inner_fw2", + "target": "web_app_2", + "label": "Filtered HTTP" + }, + { + "id": "r27", + "source": "web_app_2", + "target": "ldap_replica", + "label": "Authorisation lookup", + "technology": "LDAP" + }, + { + "id": "r28", + "source": "web_app_2", + "target": "db_replica", + "label": "Read / Standby", + "technology": "SQL" + }, + { + "id": "r30", + "source": "db_primary", + "target": "logging_server_1", + "label": "DB access + edit logs", + "technology": "Internal" + }, + { + "id": "r31", + "source": "web_app_1", + "target": "logging_server_1", + "label": "App event logs", + "technology": "Internal" + }, + { + "id": "r32", + "source": "db_replica", + "target": "logging_server_2", + "label": "DB read logs", + "technology": "Internal" + }, + { + "id": "r33", + "source": "web_app_2", + "target": "logging_server_2", + "label": "App event logs", + "technology": "Internal" + }, + { + "id": "r34", + "source": "logging_server_1", + "target": "logging_server_2", + "label": "Log replication" + }, + { + "id": "r35", + "source": "admin_clients", + "target": "auth_gw1", + "label": "Admin login \u00b7 VPN / internal", + "technology": "OAuth2 / OIDC", + "note": "Bypasses user LB" + }, + { + "id": "r36", + "source": "inner_fw1", + "target": "logging_server_1", + "label": "Admin read access", + "note": "Only admin-role tokens admitted" + }, + { + "id": "r37", + "source": "admin_clients", + "target": "auth_gw2", + "label": "Admin login \u00b7 failover", + "technology": "OAuth2 / OIDC" + }, + { + "id": "r38", + "source": "inner_fw2", + "target": "logging_server_2", + "label": "Admin read access" + }, + { + "id": "r39", + "source": "logging_server_1", + "target": "siem_1", + "label": "Log forwarding", + "technology": "Internal" + }, + { + "id": "r40", + "source": "logging_server_2", + "target": "siem_2", + "label": "Log forwarding", + "technology": "Internal" + }, + { + "id": "r41", + "source": "siem_1", + "target": "siem_2", + "label": "SIEM replication", + "technology": "Internal", + "note": "Bidirectional cross-DC threat correlation" + } + ] +} \ No newline at end of file diff --git a/data/26_it_3_ground_truth.MMD b/data/26_it_3_ground_truth.MMD new file mode 100644 index 0000000..456d38b --- /dev/null +++ b/data/26_it_3_ground_truth.MMD @@ -0,0 +1,125 @@ +--- +config: + theme: default +--- +flowchart LR + user_clients["User Clients\n[Person]\nWeb + Mobile"] + admin_clients["Admin / Audit Users\n[Person]"] + cdn_edge["CDN Edge\n[External System]\nEdge Cache + TLS"] + global_lb["Global Load Balancer\n[External System]\nDNS / Anycast"] + geofencing_svc["Geofencing Service\n[External System]\nZone + IP Check"] + telecom_provider["Telecom Provider\n[External System]\nCell Location + SMS MFA"] + + subgraph enterprise_net["Enterprise Network"] + subgraph dc1["Data Center 1 — Primary"] + subgraph dmz1["DMZ"] + outer_fw1["Outer Firewall DC1\n[Device]"] + lb1["Load Balancer DC1\n[Device]"] + end + subgraph auth_zone1["Auth Zone"] + auth_gw1["Auth Gateway DC1\n[Server]\nPolicy Enforcement Point"] + iam_server["IAM Server\n[Server]\nIdP / Auth Server"] + token_store[("Token Store\n[Server]\nRedis")] + end + subgraph app_zone1["App Zone"] + inner_fw1["Inner Firewall DC1\n[Device]"] + web_app_1["Web App 1\n[Server]"] + ldap_directory[("LDAP Directory\n[Server]")] + db_primary[("PostgreSQL Primary\n[Server]")] + logging_server_1[("Logging Server DC1\n[Server]\nAdmin access only")] + end + subgraph monitoring_zone1["Monitoring Zone"] + siem_1[("SIEM DC1\n[Server]\nCorrelation + Alerting")] + end + end + + subgraph dc2["Data Center 2 — Standby"] + subgraph dmz2["DMZ"] + outer_fw2["Outer Firewall DC2\n[Device]"] + lb2["Load Balancer DC2\n[Device]"] + end + subgraph auth_zone2["Auth Zone"] + auth_gw2["Auth Gateway DC2\n[Server]\nPolicy Enforcement Point"] + iam_replica["IAM Replica\n[Server]"] + token_store_replica[("Token Store Replica\n[Server]\nRedis")] + end + subgraph app_zone2["App Zone"] + inner_fw2["Inner Firewall DC2\n[Device]"] + web_app_2["Web App 2\n[Server]"] + ldap_replica[("LDAP Replica\n[Server]")] + db_replica[("PostgreSQL Replica\n[Server]")] + logging_server_2[("Logging Server DC2\n[Server]\nAdmin access only")] + end + subgraph monitoring_zone2["Monitoring Zone"] + siem_2[("SIEM DC2\n[Server]\nCorrelation + Alerting")] + end + end + end + + %% User traffic path + user_clients -->|"HTTPS"| cdn_edge + cdn_edge -->|"Cache miss / dynamic"| global_lb + global_lb -->|"Active"| outer_fw1 + global_lb -.->|"Failover"| outer_fw2 + + %% DC1 — outer DMZ + outer_fw1 -->|"Filtered HTTPS"| lb1 + lb1 -->|"HTTPS"| auth_gw1 + + %% DC1 — Auth Zone: geofencing + IAM gate + auth_gw1 -->|"Auth check · OAuth2/OIDC"| iam_server + iam_server -->|"Issue / validate token"| token_store + iam_server -->|"Zone + IP check"| geofencing_svc + geofencing_svc -->|"Cell location + MFA trigger"| telecom_provider + telecom_provider -->|"SMS MFA code"| user_clients + + %% DC1 — Auth Zone → App Zone (only after auth passes) + auth_gw1 -->|"Authenticated session"| inner_fw1 + + %% DC1 — App Zone: user traffic + inner_fw1 -->|"Filtered HTTP"| web_app_1 + web_app_1 -->|"Authorisation lookup · LDAP"| ldap_directory + web_app_1 -->|"Read / Write · SQL"| db_primary + + %% DC1 — write logs (system-initiated, no LB) + db_primary -->|"DB access + edit logs"| logging_server_1 + web_app_1 -->|"App event logs"| logging_server_1 + + %% DC1 — admin read access (bypasses user LB, through Auth Zone) + admin_clients -->|"Admin login · VPN / internal"| auth_gw1 + inner_fw1 -->|"Admin read access"| logging_server_1 + + %% Cross-DC replication + db_primary -->|"Streaming replication · WAL"| db_replica + iam_server -->|"Sync"| iam_replica + token_store -->|"Redis replication"| token_store_replica + ldap_directory -->|"LDAP replication"| ldap_replica + logging_server_1 -->|"Log replication"| logging_server_2 + outer_fw1 <-->|"Encrypted WAN · IPsec"| outer_fw2 + + %% DC2 — outer DMZ + outer_fw2 -->|"Filtered HTTPS"| lb2 + lb2 -->|"HTTPS"| auth_gw2 + + %% DC2 — Auth Zone + auth_gw2 -->|"Auth check · OAuth2/OIDC"| iam_replica + iam_replica -->|"Issue / validate token"| token_store_replica + auth_gw2 -->|"Authenticated session"| inner_fw2 + + %% DC2 — App Zone: user traffic + inner_fw2 -->|"Filtered HTTP"| web_app_2 + web_app_2 -->|"Authorisation lookup · LDAP"| ldap_replica + web_app_2 -->|"Read / Standby · SQL"| db_replica + + %% DC2 — write logs + db_replica -->|"DB read logs"| logging_server_2 + web_app_2 -->|"App event logs"| logging_server_2 + + %% DC2 — admin read access on failover + admin_clients -->|"Admin login · failover"| auth_gw2 + inner_fw2 -->|"Admin read access"| logging_server_2 + + %% Per-DC SIEM aggregation + cross-DC replication + logging_server_1 -->|"Log forwarding"| siem_1 + logging_server_2 -->|"Log forwarding"| siem_2 + siem_1 <-->|"SIEM replication"| siem_2 diff --git a/data/27_it_3.JSON b/data/27_it_3.JSON new file mode 100644 index 0000000..01ed6cf --- /dev/null +++ b/data/27_it_3.JSON @@ -0,0 +1,292 @@ +{ + "metadata": { + "id": "it_3_27", + "diagram_type": "network_topology", + "tier": 3, + "entity_count": 28, + "container_count": 4, + "attachment_count": 0, + "description": "Two-office corporate network (HQ + branch) with AWS cloud hub. Each office has a full network stack (router, firewall, switch, VPN gateway, NAS, NVR + IP cameras, access control, POS, VoIP, WiFi AP, user clients). Both offices connect to AWS via site-to-site VPN. AWS hosts S3 backup buckets for NAS and video footage. Remote staff connect via client VPN to access company backups." + }, + "system_boundary": { + "id": "enterprise_wan", + "name": "Enterprise WAN", + "type": "deployment_environment" + }, + "elements": [ + { + "id": "isp1", + "name": "ISP (Office 1)", + "type": "external_system", + "description": "Internet service provider for the HQ office. Provides WAN uplink to the HQ router" + }, + { + "id": "isp2", + "name": "ISP (Office 2)", + "type": "external_system", + "description": "Internet service provider for the branch office. Provides WAN uplink to the branch router" + }, + { + "id": "remote_users", + "name": "Remote Users", + "type": "person", + "description": "Company staff working remotely. Connect to the AWS VPN gateway via client VPN to access NAS backup and shared resources" + }, + { + "id": "office_1", + "name": "Office 1 — HQ", + "type": "network_boundary", + "boundary": "enterprise_wan", + "description": "Headquarters office LAN. Full network stack including security systems, POS, VoIP, and a VPN gateway tunnelling to AWS" + }, + { + "id": "router1", + "name": "Router (HQ)", + "type": "device", + "technology": "Router", + "boundary": "office_1", + "description": "Edge router at HQ. WAN uplink to ISP1, LAN downlink to the HQ firewall" + }, + { + "id": "firewall1", + "name": "Firewall (HQ)", + "type": "device", + "technology": "Firewall", + "boundary": "office_1", + "description": "Network firewall enforcing traffic policies between WAN and the HQ LAN" + }, + { + "id": "switch1", + "name": "Switch (HQ)", + "type": "device", + "technology": "Network Switch", + "boundary": "office_1", + "description": "Core HQ switch. Distributes traffic to all wired devices and the access point" + }, + { + "id": "vpn_gw1", + "name": "VPN Gateway (HQ)", + "type": "device", + "technology": "Site-to-Site VPN Appliance", + "boundary": "office_1", + "description": "VPN gateway at HQ. Establishes an encrypted site-to-site tunnel to the AWS VPN gateway. Routes backup traffic for NAS and NVR to AWS S3" + }, + { + "id": "nas1", + "name": "NAS (HQ)", + "type": "server", + "technology": "Network Attached Storage", + "boundary": "office_1", + "description": "Shared file storage for HQ. Wired to the switch. Backs up data to the AWS S3 NAS backup bucket via the VPN tunnel" + }, + { + "id": "nvr1", + "name": "NVR (HQ)", + "type": "server", + "technology": "Network Video Recorder", + "boundary": "office_1", + "description": "Network Video Recorder at HQ. Stores local security camera footage and replicates video archives to the AWS S3 video archive bucket via VPN" + }, + { + "id": "cameras1", + "name": "Security Cameras (HQ)", + "type": "device", + "technology": "IP Cameras (PoE)", + "boundary": "office_1", + "description": "IP security cameras at HQ, powered via PoE from the switch. Stream video to the HQ NVR" + }, + { + "id": "access_control1", + "name": "Access Control (HQ)", + "type": "server", + "technology": "Badge / Entry System", + "boundary": "office_1", + "description": "Badge reader and door controller server managing physical access at HQ. Wired to the switch" + }, + { + "id": "pos1", + "name": "POS Terminal (HQ)", + "type": "device", + "technology": "Point of Sale", + "boundary": "office_1", + "description": "Point-of-sale terminal at HQ. Wired to the switch" + }, + { + "id": "voip_phones1", + "name": "VoIP Phones (HQ)", + "type": "device", + "technology": "VoIP", + "boundary": "office_1", + "description": "Wired VoIP phones at HQ, connected to the switch" + }, + { + "id": "ap1", + "name": "Access Point (HQ)", + "type": "device", + "technology": "WiFi Access Point", + "boundary": "office_1", + "description": "Wireless access point at HQ providing WiFi connectivity to laptops" + }, + { + "id": "user_clients1", + "name": "User Clients (HQ)", + "type": "device", + "technology": "Laptops (WiFi)", + "boundary": "office_1", + "description": "Employee laptops at HQ, connecting wirelessly via the access point" + }, + { + "id": "office_2", + "name": "Office 2 — Branch", + "type": "network_boundary", + "boundary": "enterprise_wan", + "description": "Branch office LAN. Scaled-down network stack with VPN gateway connecting to the AWS hub" + }, + { + "id": "router2", + "name": "Router (Branch)", + "type": "device", + "technology": "Router", + "boundary": "office_2", + "description": "Edge router at the branch. WAN uplink to ISP2, LAN downlink to the branch firewall" + }, + { + "id": "firewall2", + "name": "Firewall (Branch)", + "type": "device", + "technology": "Firewall", + "boundary": "office_2", + "description": "Network firewall at the branch office" + }, + { + "id": "switch2", + "name": "Switch (Branch)", + "type": "device", + "technology": "Network Switch", + "boundary": "office_2", + "description": "Core branch switch. Distributes traffic to all wired devices and the access point" + }, + { + "id": "vpn_gw2", + "name": "VPN Gateway (Branch)", + "type": "device", + "technology": "Site-to-Site VPN Appliance", + "boundary": "office_2", + "description": "VPN gateway at the branch. Establishes an encrypted site-to-site tunnel to the AWS VPN gateway" + }, + { + "id": "nas2", + "name": "NAS (Branch)", + "type": "server", + "technology": "Network Attached Storage", + "boundary": "office_2", + "description": "Shared file storage for the branch. Backs up data to AWS S3 via the VPN tunnel" + }, + { + "id": "nvr2", + "name": "NVR (Branch)", + "type": "server", + "technology": "Network Video Recorder", + "boundary": "office_2", + "description": "Network Video Recorder at the branch. Replicates video archives to AWS S3 video archive via VPN" + }, + { + "id": "cameras2", + "name": "Security Cameras (Branch)", + "type": "device", + "technology": "IP Cameras (PoE)", + "boundary": "office_2", + "description": "IP security cameras at the branch, powered via PoE. Stream video to the branch NVR" + }, + { + "id": "pos2", + "name": "POS Terminal (Branch)", + "type": "device", + "technology": "Point of Sale", + "boundary": "office_2", + "description": "Point-of-sale terminal at the branch office" + }, + { + "id": "ap2", + "name": "Access Point (Branch)", + "type": "device", + "technology": "WiFi Access Point", + "boundary": "office_2", + "description": "Wireless access point at the branch providing WiFi connectivity" + }, + { + "id": "user_clients2", + "name": "User Clients (Branch)", + "type": "device", + "technology": "Laptops (WiFi)", + "boundary": "office_2", + "description": "Employee laptops at the branch, connecting wirelessly via the access point" + }, + { + "id": "aws_cloud", + "name": "AWS Cloud", + "type": "cloud_environment", + "boundary": "enterprise_wan", + "description": "AWS cloud hub. Hosts VPN gateway, NAS backup bucket, and video archive bucket. Acts as the central hub in a hub-and-spoke VPN topology connecting both offices and remote users" + }, + { + "id": "vpn_gw_aws", + "name": "AWS VPN Gateway", + "type": "server", + "technology": "AWS Virtual Private Gateway / Client VPN", + "boundary": "aws_cloud", + "description": "AWS VPN gateway. Terminates site-to-site tunnels from both offices and client VPN connections from remote users. Routes authenticated traffic to the S3 backup buckets" + }, + { + "id": "s3_nas_backup", + "name": "S3 NAS Backup", + "type": "server", + "technology": "AWS S3 Bucket", + "boundary": "aws_cloud", + "description": "S3 object storage bucket for NAS backups. Receives backup jobs from both office NAS systems via the VPN tunnel. Accessible to remote users via company VPN" + }, + { + "id": "s3_video_archive", + "name": "S3 Video Archive", + "type": "server", + "technology": "AWS S3 / S3 Glacier", + "boundary": "aws_cloud", + "description": "S3 / Glacier bucket for video footage archival. Receives video backup exports from both office NVRs via VPN. Long-term retention for compliance and security review" + } + ], + "relationships": [ + { "id":"r1", "source":"isp1", "target":"router1", "label":"WAN link" }, + { "id":"r2", "source":"router1", "target":"firewall1", "label":"WAN → LAN" }, + { "id":"r3", "source":"firewall1", "target":"switch1", "label":"Filtered traffic" }, + { "id":"r4", "source":"switch1", "target":"nas1", "label":"Wired" }, + { "id":"r5", "source":"switch1", "target":"nvr1", "label":"Wired" }, + { "id":"r6", "source":"switch1", "target":"cameras1", "label":"PoE" }, + { "id":"r7", "source":"cameras1", "target":"nvr1", "label":"Video stream" }, + { "id":"r8", "source":"switch1", "target":"access_control1","label":"Wired" }, + { "id":"r9", "source":"switch1", "target":"pos1", "label":"Wired" }, + { "id":"r10", "source":"switch1", "target":"voip_phones1", "label":"Wired" }, + { "id":"r11", "source":"switch1", "target":"ap1", "label":"Wired uplink" }, + { "id":"r12", "source":"ap1", "target":"user_clients1", "label":"WiFi" }, + { "id":"r13", "source":"switch1", "target":"vpn_gw1", "label":"Wired" }, + { "id":"r14", "source":"vpn_gw1", "target":"vpn_gw_aws", "label":"Site-to-site VPN", "technology":"IPsec" }, + { "id":"r15", "source":"nas1", "target":"s3_nas_backup", "label":"NAS backup via VPN", "technology":"S3 API" }, + { "id":"r16", "source":"nvr1", "target":"s3_video_archive","label":"Video archive via VPN","technology":"S3 API" }, + { "id":"r17", "source":"isp2", "target":"router2", "label":"WAN link" }, + { "id":"r18", "source":"router2", "target":"firewall2", "label":"WAN → LAN" }, + { "id":"r19", "source":"firewall2", "target":"switch2", "label":"Filtered traffic" }, + { "id":"r20", "source":"switch2", "target":"nas2", "label":"Wired" }, + { "id":"r21", "source":"switch2", "target":"nvr2", "label":"Wired" }, + { "id":"r22", "source":"switch2", "target":"cameras2", "label":"PoE" }, + { "id":"r23", "source":"cameras2", "target":"nvr2", "label":"Video stream" }, + { "id":"r24", "source":"switch2", "target":"pos2", "label":"Wired" }, + { "id":"r25", "source":"switch2", "target":"ap2", "label":"Wired uplink" }, + { "id":"r26", "source":"ap2", "target":"user_clients2", "label":"WiFi" }, + { "id":"r27", "source":"switch2", "target":"vpn_gw2", "label":"Wired" }, + { "id":"r28", "source":"vpn_gw2", "target":"vpn_gw_aws", "label":"Site-to-site VPN", "technology":"IPsec" }, + { "id":"r29", "source":"nas2", "target":"s3_nas_backup", "label":"NAS backup via VPN", "technology":"S3 API" }, + { "id":"r30", "source":"nvr2", "target":"s3_video_archive","label":"Video archive via VPN","technology":"S3 API" }, + { "id":"r31", "source":"remote_users", "target":"vpn_gw_aws", "label":"Client VPN", "technology":"VPN" }, + { "id":"r32", "source":"vpn_gw_aws", "target":"s3_nas_backup", "label":"Route to NAS backup" }, + { "id":"r33", "source":"vpn_gw_aws", "target":"s3_video_archive","label":"Route to video archive" } + ] +} diff --git a/data/27_it_3_ground_truth.MMD b/data/27_it_3_ground_truth.MMD new file mode 100644 index 0000000..918efc3 --- /dev/null +++ b/data/27_it_3_ground_truth.MMD @@ -0,0 +1,90 @@ +--- +config: + theme: default +--- +flowchart LR + isp1["ISP (Office 1)\n[External System]"] + isp2["ISP (Office 2)\n[External System]"] + remote_users["Remote Users\n[Person]\nClient VPN"] + + subgraph enterprise_wan["Enterprise WAN"] + subgraph office_1["Office 1 — HQ"] + direction LR + router1["Router\n[Device]"] + firewall1["Firewall\n[Device]"] + switch1["Switch\n[Device]"] + vpn_gw1["VPN Gateway\n[Device]\nSite-to-Site"] + nas1[("NAS\n[Server]\nFile Storage")] + nvr1[("NVR\n[Server]\nVideo Recorder")] + cameras1["Security Cameras\n[Device]\nIP PoE"] + access_control1[("Access Control\n[Server]\nBadge / Entry")] + pos1["POS Terminal\n[Device]"] + voip_phones1["VoIP Phones\n[Device]"] + ap1["Access Point\n[Device]\nWiFi"] + user_clients1["User Clients\n[Device]\nLaptops (WiFi)"] + end + + subgraph office_2["Office 2 — Branch"] + direction LR + router2["Router\n[Device]"] + firewall2["Firewall\n[Device]"] + switch2["Switch\n[Device]"] + vpn_gw2["VPN Gateway\n[Device]\nSite-to-Site"] + nas2[("NAS\n[Server]\nFile Storage")] + nvr2[("NVR\n[Server]\nVideo Recorder")] + cameras2["Security Cameras\n[Device]\nIP PoE"] + pos2["POS Terminal\n[Device]"] + ap2["Access Point\n[Device]\nWiFi"] + user_clients2["User Clients\n[Device]\nLaptops (WiFi)"] + end + + subgraph aws_cloud["AWS Cloud"] + direction LR + vpn_gw_aws["AWS VPN Gateway\n[Server]\nHub"] + s3_nas_backup[("S3 NAS Backup\n[Server]\nAWS S3")] + s3_video_archive[("S3 Video Archive\n[Server]\nAWS S3 / Glacier")] + end + end + + %% Office 1 internal + isp1 -->|"WAN link"| router1 + router1 -->|"WAN → LAN"| firewall1 + firewall1 -->|"Filtered traffic"| switch1 + switch1 -->|"Wired"| nas1 + switch1 -->|"Wired"| nvr1 + switch1 -->|"PoE"| cameras1 + cameras1 -->|"Video stream"| nvr1 + switch1 -->|"Wired"| access_control1 + switch1 -->|"Wired"| pos1 + switch1 -->|"Wired"| voip_phones1 + switch1 -->|"Wired uplink"| ap1 + ap1 -->|"WiFi"| user_clients1 + switch1 -->|"Wired"| vpn_gw1 + + %% Office 2 internal + isp2 -->|"WAN link"| router2 + router2 -->|"WAN → LAN"| firewall2 + firewall2 -->|"Filtered traffic"| switch2 + switch2 -->|"Wired"| nas2 + switch2 -->|"Wired"| nvr2 + switch2 -->|"PoE"| cameras2 + cameras2 -->|"Video stream"| nvr2 + switch2 -->|"Wired"| pos2 + switch2 -->|"Wired uplink"| ap2 + ap2 -->|"WiFi"| user_clients2 + switch2 -->|"Wired"| vpn_gw2 + + %% VPN tunnels to AWS hub + vpn_gw1 -->|"Site-to-site VPN · IPsec"| vpn_gw_aws + vpn_gw2 -->|"Site-to-site VPN · IPsec"| vpn_gw_aws + remote_users -->|"Client VPN"| vpn_gw_aws + + %% Backup flows via VPN + nas1 -->|"NAS backup via VPN · S3"| s3_nas_backup + nvr1 -->|"Video archive via VPN · S3"| s3_video_archive + nas2 -->|"NAS backup via VPN · S3"| s3_nas_backup + nvr2 -->|"Video archive via VPN · S3"| s3_video_archive + + %% Remote user access via VPN + vpn_gw_aws -->|"Route to NAS backup"| s3_nas_backup + vpn_gw_aws -->|"Route to video archive"| s3_video_archive diff --git a/data/28_it_3.JSON b/data/28_it_3.JSON new file mode 100644 index 0000000..179d509 --- /dev/null +++ b/data/28_it_3.JSON @@ -0,0 +1,258 @@ +{ + "metadata": { + "id": "it_3_28", + "diagram_type": "c4_container", + "tier": 3, + "entity_count": 26, + "container_count": 2, + "attachment_count": 0, + "description": "Extended GCP data analysis stack. Replaces Vertex AI Gemini with an external Claude API (Anthropic) called via the MCP server. Adds a second access path for external partners: CDN + Cloud Armor WAF + API Gateway with OAuth2 auth, routing to a separate internal external-client database (not the internal PostgreSQL). Includes a vector store for RAG context retrieval, Redis cache, event-driven notification pipeline (Cloud Tasks → background worker → Pub/Sub → notification service → partner webhook) with a Pub/Sub dead-letter queue for failed deliveries, Cloud Scheduler for periodic jobs, BigQuery analytics warehouse fed by the background worker and queried by the web app, and full observability via Cloud Monitoring, Error Reporting, and Secret Manager." + }, + "system_boundary": { + "id": "google_cloud", + "name": "Google Cloud", + "type": "cloud_environment" + }, + "elements": [ + { + "id": "internal_user", + "name": "Internal User", + "type": "person", + "description": "Internal analyst accessing the web app via browser through Google IAP" + }, + { + "id": "external_user", + "name": "External User", + "type": "person", + "description": "External partner or client accessing the web app through the public API path (CDN → WAF → API Gateway)" + }, + { + "id": "claude_api", + "name": "Claude API", + "type": "external_system", + "description": "Anthropic Claude API. Provides AI inference for the MCP server — replaces Vertex AI Gemini. Called over HTTPS with API key from Secret Manager" + }, + { + "id": "oauth_provider", + "name": "OAuth2 Provider", + "type": "external_system", + "description": "External OAuth2 / OIDC provider (e.g., Auth0). Authenticates external users and issues JWT tokens validated by the API Gateway" + }, + { + "id": "cdn", + "name": "CDN", + "type": "external_system", + "description": "Content delivery network (e.g., Cloudflare). Fronts external user traffic, terminates TLS at the edge, and forwards requests to Cloud Armor" + }, + { + "id": "external_webhook", + "name": "Partner Webhook", + "type": "external_system", + "description": "External partner endpoint that receives webhook notifications when background analysis jobs complete" + }, + { + "id": "iap", + "name": "Google IAP", + "type": "container", + "technology": "Identity-Aware Proxy", + "boundary": "google_cloud", + "description": "Google Identity-Aware Proxy. Enforces Google identity authentication for internal users before forwarding requests to the web app" + }, + { + "id": "cloud_armor", + "name": "Cloud Armor", + "type": "container", + "technology": "WAF / DDoS Protection", + "boundary": "google_cloud", + "description": "Google Cloud Armor. WAF and DDoS protection layer for the external access path. Filters malicious traffic from the CDN before it reaches the API Gateway" + }, + { + "id": "api_gateway", + "name": "API Gateway", + "type": "container", + "technology": "Cloud Endpoints / API Gateway", + "boundary": "google_cloud", + "description": "Cloud API Gateway for external users. Validates OAuth2 JWT tokens from the OAuth2 provider, enforces rate limits, and routes authenticated external requests to the web app" + }, + { + "id": "web_app", + "name": "Web App", + "type": "container", + "technology": "Web Application / MCP Host", + "boundary": "google_cloud", + "description": "Web application serving both internal and external users. Presents an internal view (IAP path) backed by the internal PostgreSQL and an external view (API Gateway path) backed by the external client database. Acts as MCP host for AI-assisted analysis" + }, + { + "id": "mcp_server", + "name": "MCP Server", + "type": "container", + "technology": "MCP Server", + "boundary": "google_cloud", + "description": "MCP server exposing data query tools to the web app. Calls the external Claude API for AI inference, queries the vector store for RAG context, and queries the internal orders view for structured data" + }, + { + "id": "vector_store", + "name": "Vector Store", + "type": "container", + "technology": "Vertex AI Vector Search / Embeddings", + "boundary": "google_cloud", + "description": "Vector database storing document embeddings for retrieval-augmented generation (RAG). Queried by the MCP server to fetch relevant context before calling the Claude API" + }, + { + "id": "redis_cache", + "name": "Redis Cache", + "type": "container", + "technology": "Cloud Memorystore (Redis)", + "boundary": "google_cloud", + "description": "In-memory cache for session data, API response caching, and rate-limit counters. Reduces load on the web app and databases" + }, + { + "id": "cloud_tasks", + "name": "Cloud Tasks", + "type": "container", + "technology": "GCP Cloud Tasks", + "boundary": "google_cloud", + "description": "Managed async task queue. Receives jobs enqueued by the web app and dispatches them to the background worker" + }, + { + "id": "scheduler", + "name": "Cloud Scheduler", + "type": "container", + "technology": "GCP Cloud Scheduler", + "boundary": "google_cloud", + "description": "Managed cron scheduler. Triggers periodic background worker jobs (e.g., nightly data exports, report generation)" + }, + { + "id": "background_worker", + "name": "Background Worker", + "type": "container", + "technology": "Cloud Run / Worker Service", + "boundary": "google_cloud", + "description": "Long-running worker triggered by Cloud Tasks and Cloud Scheduler. Processes jobs, reads/writes internal PostgreSQL, stores results in Cloud Storage, and publishes completion events to Pub/Sub" + }, + { + "id": "cloud_storage", + "name": "Cloud Storage", + "type": "container", + "technology": "GCS Bucket", + "boundary": "google_cloud", + "description": "GCS bucket. Stores job results, exports, and uploaded files from both the web app and background worker" + }, + { + "id": "pub_sub", + "name": "Pub/Sub", + "type": "container", + "technology": "GCP Cloud Pub/Sub", + "boundary": "google_cloud", + "description": "Managed event bus. Receives job completion events from the background worker and fans them out to the notification service" + }, + { + "id": "notification_svc", + "name": "Notification Service", + "type": "container", + "technology": "Cloud Run", + "boundary": "google_cloud", + "description": "Event-driven notification service subscribed to Pub/Sub. Delivers webhook payloads to registered external partner endpoints when analysis jobs complete" + }, + { + "id": "secret_manager", + "name": "Secret Manager", + "type": "container", + "technology": "GCP Secret Manager", + "boundary": "google_cloud", + "description": "Manages and vends secrets: Claude API key, OAuth2 client credentials, DB passwords, and webhook signing keys" + }, + { + "id": "cloud_monitoring", + "name": "Cloud Monitoring", + "type": "container", + "technology": "GCP Cloud Monitoring", + "boundary": "google_cloud", + "description": "Centralised metrics and logging. Collects telemetry from the web app and background worker" + }, + { + "id": "postgres", + "name": "PostgreSQL", + "type": "container", + "technology": "Cloud SQL (PostgreSQL)", + "boundary": "google_cloud", + "description": "Internal PostgreSQL database. Stores company data for internal users. The orders view is exposed read-only to the MCP server" + }, + { + "id": "orders_view", + "name": "orders", + "type": "container", + "technology": "Data Store (DB View)", + "boundary": "postgres", + "description": "Read-only PostgreSQL view over internal order data. Queried exclusively by the MCP server for AI-assisted analysis" + }, + { + "id": "external_client_db", + "name": "External Client DB", + "type": "container", + "technology": "Cloud SQL (PostgreSQL)", + "boundary": "google_cloud", + "description": "Separate internal database for external partner and client data. Accessed by the web app when serving external users through the API Gateway path. Kept isolated from the internal PostgreSQL by design" + }, + { + "id": "bigquery", + "name": "BigQuery", + "type": "container", + "technology": "GCP BigQuery", + "boundary": "google_cloud", + "description": "Analytics data warehouse. Receives processed job results from the background worker and serves analytics queries from the web app for internal dashboards" + }, + { + "id": "pubsub_dlq", + "name": "Notification DLQ", + "type": "container", + "technology": "GCP Pub/Sub (Dead-Letter Topic)", + "boundary": "google_cloud", + "description": "Dead-letter topic for the notification pipeline. Captures webhook deliveries that exhaust their retry budget so operators can replay or investigate them" + }, + { + "id": "error_reporting", + "name": "Error Reporting", + "type": "container", + "technology": "GCP Error Reporting", + "boundary": "google_cloud", + "description": "Centralised error aggregation. Collects unhandled exceptions and stack traces from the web app, background worker, and notification service" + } + ], + "relationships": [ + { "id":"r1", "source":"internal_user", "target":"iap", "label":"Authenticates · HTTPS", "technology":"HTTPS" }, + { "id":"r2", "source":"iap", "target":"web_app", "label":"Forwards authenticated request" }, + { "id":"r3", "source":"external_user", "target":"cdn", "label":"HTTPS", "technology":"HTTPS" }, + { "id":"r4", "source":"cdn", "target":"cloud_armor", "label":"Filtered HTTPS" }, + { "id":"r5", "source":"cloud_armor", "target":"api_gateway", "label":"WAF-filtered request" }, + { "id":"r6", "source":"api_gateway", "target":"oauth_provider", "label":"Validate JWT", "technology":"OAuth2 / OIDC" }, + { "id":"r7", "source":"api_gateway", "target":"web_app", "label":"Authenticated external request" }, + { "id":"r8", "source":"web_app", "target":"mcp_server", "label":"Calls MCP tools", "technology":"MCP" }, + { "id":"r9", "source":"mcp_server", "target":"claude_api", "label":"AI inference", "technology":"Anthropic API" }, + { "id":"r10", "source":"mcp_server", "target":"vector_store", "label":"Fetch RAG context", "technology":"Vector search" }, + { "id":"r11", "source":"mcp_server", "target":"orders_view", "label":"Read-only query", "technology":"SQL" }, + { "id":"r12", "source":"web_app", "target":"redis_cache", "label":"Session / response cache", "technology":"Redis" }, + { "id":"r13", "source":"web_app", "target":"postgres", "label":"Internal user data", "technology":"SQL" }, + { "id":"r14", "source":"web_app", "target":"external_client_db","label":"External client data", "technology":"SQL" }, + { "id":"r15", "source":"web_app", "target":"cloud_tasks", "label":"Enqueue async job", "technology":"Cloud Tasks API" }, + { "id":"r16", "source":"web_app", "target":"cloud_storage", "label":"Upload / download files", "technology":"GCS API" }, + { "id":"r17", "source":"web_app", "target":"secret_manager", "label":"Fetch secrets" }, + { "id":"r18", "source":"web_app", "target":"cloud_monitoring", "label":"Metrics / logs" }, + { "id":"r19", "source":"cloud_tasks", "target":"background_worker","label":"Trigger job", "technology":"HTTP callback" }, + { "id":"r20", "source":"scheduler", "target":"background_worker","label":"Periodic trigger", "technology":"HTTP" }, + { "id":"r21", "source":"background_worker", "target":"postgres", "label":"Read / write data", "technology":"SQL" }, + { "id":"r22", "source":"background_worker", "target":"cloud_storage", "label":"Store results", "technology":"GCS API" }, + { "id":"r23", "source":"background_worker", "target":"pub_sub", "label":"Publish job result", "technology":"Pub/Sub" }, + { "id":"r24", "source":"background_worker", "target":"secret_manager", "label":"Fetch credentials" }, + { "id":"r25", "source":"background_worker", "target":"cloud_monitoring", "label":"Metrics / logs" }, + { "id":"r26", "source":"pub_sub", "target":"notification_svc", "label":"Job complete event" }, + { "id":"r27", "source":"notification_svc", "target":"external_webhook", "label":"Webhook delivery", "technology":"HTTPS" }, + { "id":"r28", "source":"notification_svc", "target":"secret_manager", "label":"Fetch signing key" }, + { "id":"r29", "source":"background_worker", "target":"bigquery", "label":"Load processed results", "technology":"BigQuery API" }, + { "id":"r30", "source":"web_app", "target":"bigquery", "label":"Analytics queries", "technology":"BigQuery API" }, + { "id":"r31", "source":"notification_svc", "target":"pubsub_dlq", "label":"Failed delivery", "technology":"Pub/Sub" }, + { "id":"r32", "source":"web_app", "target":"error_reporting", "label":"Exceptions" }, + { "id":"r33", "source":"background_worker", "target":"error_reporting", "label":"Exceptions" }, + { "id":"r34", "source":"notification_svc", "target":"error_reporting", "label":"Exceptions" } + ] +} diff --git a/data/28_it_3_ground_truth.MMD b/data/28_it_3_ground_truth.MMD new file mode 100644 index 0000000..d16571d --- /dev/null +++ b/data/28_it_3_ground_truth.MMD @@ -0,0 +1,84 @@ +--- +config: + theme: default +--- +flowchart LR + internal_user["Internal User\n[Person]"] + external_user["External User\n[Person]\nPartner / Client"] + claude_api["Claude API\n[External System]\nAnthropic"] + oauth_provider["OAuth2 Provider\n[External System]\nAuth0 / OIDC"] + cdn["CDN\n[External System]\nEdge / Cloudflare"] + external_webhook["Partner Webhook\n[External System]"] + + subgraph google_cloud["Google Cloud"] + direction LR + iap["Google IAP\n[Container]\nIdentity-Aware Proxy"] + cloud_armor["Cloud Armor\n[Container]\nWAF / DDoS"] + api_gateway["API Gateway\n[Container]\nCloud Endpoints"] + web_app["Web App\n[Container]\nMCP Host"] + mcp_server["MCP Server\n[Container]"] + vector_store[("Vector Store\n[Container]\nRAG Embeddings")] + redis_cache[("Redis Cache\n[Container]\nMemorystore")] + cloud_tasks["Cloud Tasks\n[Container]\nAsync Queue"] + scheduler["Cloud Scheduler\n[Container]"] + background_worker["Background Worker\n[Container]"] + cloud_storage[("Cloud Storage\n[Container]\nGCS Bucket")] + pub_sub["Pub/Sub\n[Container]\nEvent Bus"] + notification_svc["Notification Service\n[Container]\nCloud Run"] + secret_manager["Secret Manager\n[Container]"] + cloud_monitoring["Cloud Monitoring\n[Container]"] + error_reporting["Error Reporting\n[Container]\nException Aggregation"] + external_client_db[("External Client DB\n[Container]\nCloud SQL")] + bigquery[("BigQuery\n[Container]\nAnalytics Warehouse")] + pubsub_dlq["Notification DLQ\n[Container]\nPub/Sub Dead-Letter"] + + subgraph postgres["PostgreSQL"] + orders_view[("orders\n[Data Store]\nDB View")] + end + end + + %% Internal user path + internal_user -->|"Authenticates · HTTPS"| iap + iap -->|"Forwards authenticated request"| web_app + + %% External user path + external_user -->|"HTTPS"| cdn + cdn -->|"Filtered HTTPS"| cloud_armor + cloud_armor -->|"WAF-filtered request"| api_gateway + api_gateway -->|"Validate JWT · OAuth2"| oauth_provider + api_gateway -->|"Authenticated external request"| web_app + + %% Web app → MCP + AI + web_app -->|"Calls MCP tools · MCP"| mcp_server + mcp_server -->|"AI inference · Anthropic API"| claude_api + mcp_server -->|"Fetch RAG context"| vector_store + mcp_server -->|"Read-only query · SQL"| orders_view + + %% Web app → data + cache + web_app -->|"Session / response cache"| redis_cache + web_app -->|"Internal user data · SQL"| postgres + web_app -->|"External client data · SQL"| external_client_db + web_app -->|"Analytics queries"| bigquery + web_app -->|"Upload / download · GCS"| cloud_storage + web_app -->|"Fetch secrets"| secret_manager + web_app -->|"Metrics / logs"| cloud_monitoring + web_app -->|"Exceptions"| error_reporting + + %% Async pipeline + web_app -->|"Enqueue async job"| cloud_tasks + cloud_tasks -->|"Trigger job"| background_worker + scheduler -->|"Periodic trigger"| background_worker + background_worker -->|"Read / write · SQL"| postgres + background_worker -->|"Store results · GCS"| cloud_storage + background_worker -->|"Load processed results"| bigquery + background_worker -->|"Publish job result"| pub_sub + background_worker -->|"Fetch credentials"| secret_manager + background_worker -->|"Metrics / logs"| cloud_monitoring + background_worker -->|"Exceptions"| error_reporting + + %% Notification pipeline + pub_sub -->|"Job complete event"| notification_svc + notification_svc -->|"Webhook delivery · HTTPS"| external_webhook + notification_svc -->|"Failed delivery"| pubsub_dlq + notification_svc -->|"Fetch signing key"| secret_manager + notification_svc -->|"Exceptions"| error_reporting diff --git a/data/29_it_3.JSON b/data/29_it_3.JSON new file mode 100644 index 0000000..4616260 --- /dev/null +++ b/data/29_it_3.JSON @@ -0,0 +1,298 @@ +{ + "metadata": { + "id": "it_3_29", + "diagram_type": "c4_container", + "tier": 3, + "entity_count": 27, + "container_count": 5, + "attachment_count": 0, + "description": "Full SomeApp delivery and runtime stack on Infomaniak Public Cloud. Three explicit sub-environments: CI/CD (GitLab runner + Terraform + test suite + HashiCorp Vault + artifact registry), Staging (mirrored runtime: web app + background worker + Redis + PostgreSQL + object storage + monitoring + log server), and Production (web app + background worker + Redis + PostgreSQL + object storage + backup storage + Prometheus monitoring + internal log server). Cloudflare provides CDN and WAF for production. PostHog is modelled as a vendor boundary containing two product subsystems: Product Analytics (event tracking) and Error Tracking (exception aggregation, replacing a Sentry-style service); both are consumed from the production web app." + }, + "system_boundary": { + "id": "infomaniak", + "name": "Infomaniak Public Cloud", + "type": "deployment_environment" + }, + "elements": [ + { + "id": "user", + "name": "User", + "type": "person", + "description": "End user accessing SomeApp via browser (someapp.xx)" + }, + { + "id": "developer", + "name": "Developer", + "type": "person", + "description": "Developer pushing code, managing infrastructure, and reviewing pipeline results" + }, + { + "id": "cloudflare", + "name": "Cloudflare", + "type": "external_system", + "description": "CDN, WAF, and edge proxy. Terminates HTTPS at the edge, applies WAF rules, and reverse-proxies to the production web app on Infomaniak" + }, + { + "id": "gitlab", + "name": "GitLab", + "type": "external_system", + "description": "Source code repository and CI/CD orchestrator. Hosts the codebase and triggers pipeline runs on push or merge" + }, + { + "id": "infomaniak_api", + "name": "Infomaniak API", + "type": "external_system", + "description": "Infomaniak cloud management API. Used by Terraform to provision and update cloud resources" + }, + { + "id": "posthog", + "name": "PostHog", + "type": "external_system_boundary", + "description": "External PostHog vendor boundary. Groups the two distinct PostHog products consumed by the platform: Product Analytics and Error Tracking" + }, + { + "id": "posthog_analytics", + "name": "PostHog Product Analytics", + "type": "external_system", + "boundary": "posthog", + "description": "PostHog product analytics service. Receives user interaction events from the production web app for funnel analysis, feature flag evaluation, and session recording" + }, + { + "id": "posthog_errors", + "name": "PostHog Error Tracking", + "type": "external_system", + "boundary": "posthog", + "description": "PostHog error tracking service. Receives unhandled exceptions and stack traces from the production web app and background worker (Sentry-style replacement)" + }, + { + "id": "cicd_env", + "name": "CI/CD Environment", + "type": "deployment_environment", + "boundary": "infomaniak", + "description": "Dedicated Infomaniak environment hosting the CI/CD pipeline components. Isolated from staging and production" + }, + { + "id": "ci_runner", + "name": "GitLab CI Runner", + "type": "container", + "technology": "GitLab Runner", + "boundary": "cicd_env", + "description": "Self-hosted CI runner executing pipeline jobs: build, test, package, and deploy stages" + }, + { + "id": "terraform", + "name": "Terraform", + "type": "container", + "technology": "Infrastructure as Code", + "boundary": "cicd_env", + "description": "Terraform executed by the CI runner to provision and update Infomaniak resources across staging and production environments" + }, + { + "id": "test_suite", + "name": "Test Suite", + "type": "container", + "technology": "Automated Tests", + "boundary": "cicd_env", + "description": "Unit, integration, and end-to-end test suite executed by the CI runner. Gate for progression to staging and production deployments" + }, + { + "id": "vault", + "name": "HashiCorp Vault", + "type": "container", + "technology": "Secrets Management", + "boundary": "cicd_env", + "description": "Central secrets management server. Vends database credentials, API keys (PostHog, Cloudflare, Infomaniak), and signing certificates to the CI runner for deployments and to the production web app at runtime" + }, + { + "id": "artifact_registry", + "name": "Artifact Registry", + "type": "container", + "technology": "Docker / Container Registry", + "boundary": "cicd_env", + "description": "Internal container image registry. The CI runner pushes built Docker images here; staging and production pull verified images for deployment" + }, + { + "id": "staging_env", + "name": "Staging", + "type": "deployment_environment", + "boundary": "infomaniak", + "description": "Pre-production staging environment mirroring production topology. Used for final validation before production deployment" + }, + { + "id": "web_app_staging", + "name": "SomeApp (Staging)", + "type": "container", + "technology": "Web Application", + "boundary": "staging_env", + "description": "Staging instance of the SomeApp web application. Receives deployments from the CI runner for pre-production testing" + }, + { + "id": "postgres_staging", + "name": "PostgreSQL (Staging)", + "type": "container", + "technology": "Relational Database", + "boundary": "staging_env", + "description": "Staging PostgreSQL database with anonymised production-like data. Used exclusively by the staging web app" + }, + { + "id": "object_storage_staging", + "name": "Object Storage (Staging)", + "type": "container", + "technology": "S3-compatible Bucket", + "boundary": "staging_env", + "description": "S3-compatible staging bucket. Mirrors the production storage layout for end-to-end testing of file upload/download flows" + }, + { + "id": "redis_cache_staging", + "name": "Redis Cache (Staging)", + "type": "container", + "technology": "Redis", + "boundary": "staging_env", + "description": "Staging Redis cache. Mirrors the production cache layer for end-to-end validation of session and rate-limit behaviour" + }, + { + "id": "background_worker_staging", + "name": "Background Worker (Staging)", + "type": "container", + "technology": "Async Worker", + "boundary": "staging_env", + "description": "Staging instance of the background worker. Validates async job processing before production deployment" + }, + { + "id": "monitoring_staging", + "name": "Monitoring (Staging)", + "type": "container", + "technology": "Prometheus + Grafana", + "boundary": "staging_env", + "description": "Staging Prometheus and Grafana stack. Scrapes metrics from the staging web app, worker, and PostgreSQL for pre-production observability validation" + }, + { + "id": "log_server_staging", + "name": "Log Server (Staging)", + "type": "container", + "technology": "Centralised Log Store", + "boundary": "staging_env", + "description": "Staging centralised log server. Receives application event logs and DB audit logs from the staging environment" + }, + { + "id": "prod_env", + "name": "Production", + "type": "deployment_environment", + "boundary": "infomaniak", + "description": "Live production environment serving real users. Receives deployments only after staging validation passes" + }, + { + "id": "web_app", + "name": "SomeApp", + "type": "container", + "technology": "Web Application", + "boundary": "prod_env", + "description": "Production web application. Serves user traffic proxied from Cloudflare, reads/writes PostgreSQL and object storage, caches via Redis, tracks events in PostHog, and fetches runtime secrets from Vault" + }, + { + "id": "redis_cache", + "name": "Redis Cache", + "type": "container", + "technology": "Redis", + "boundary": "prod_env", + "description": "In-memory cache for sessions, API responses, and rate-limit counters. Reduces database load" + }, + { + "id": "postgres", + "name": "PostgreSQL", + "type": "container", + "technology": "Relational Database", + "boundary": "prod_env", + "description": "Primary production PostgreSQL database. Handles all application reads and writes; ships access and edit events to the internal log server" + }, + { + "id": "object_storage", + "name": "Object Storage", + "type": "container", + "technology": "S3-compatible Bucket", + "boundary": "prod_env", + "description": "Primary S3-compatible object storage for user-uploaded files and application assets" + }, + { + "id": "backup_storage", + "name": "Backup Storage", + "type": "container", + "technology": "S3-compatible Bucket (Backup)", + "boundary": "prod_env", + "description": "Dedicated S3-compatible bucket for database and file backups. Receives scheduled backup exports from PostgreSQL and object storage" + }, + { + "id": "monitoring", + "name": "Monitoring", + "type": "container", + "technology": "Prometheus + Grafana", + "boundary": "prod_env", + "description": "Internal Prometheus metrics collection and Grafana dashboards. Scrapes metrics from the web app and PostgreSQL; accessible only to the developer via admin login" + }, + { + "id": "log_server", + "name": "Log Server", + "type": "container", + "technology": "Centralised Log Store", + "boundary": "prod_env", + "description": "Internal centralised log server. Receives application event logs from the web app and DB access/edit audit logs from PostgreSQL. Accessible only to the developer via admin login — not exposed through the user path" + }, + { + "id": "background_worker", + "name": "Background Worker", + "type": "container", + "technology": "Async Worker", + "boundary": "prod_env", + "description": "Production background worker. Processes async jobs enqueued by the web app, reads/writes PostgreSQL and object storage, and reports errors to PostHog Error Tracking" + } + ], + "relationships": [ + { "id":"r1", "source":"developer", "target":"gitlab", "label":"Push code", "technology":"Git / HTTPS" }, + { "id":"r2", "source":"gitlab", "target":"ci_runner", "label":"Trigger pipeline", "technology":"GitLab CI" }, + { "id":"r3", "source":"ci_runner", "target":"test_suite", "label":"Execute tests" }, + { "id":"r4", "source":"ci_runner", "target":"vault", "label":"Fetch deploy secrets" }, + { "id":"r5", "source":"ci_runner", "target":"terraform", "label":"Run plan / apply" }, + { "id":"r6", "source":"terraform", "target":"infomaniak_api", "label":"Provision infrastructure", "technology":"REST API" }, + { "id":"r7", "source":"ci_runner", "target":"artifact_registry", "label":"Push image", "technology":"Docker" }, + { "id":"r8", "source":"ci_runner", "target":"web_app_staging", "label":"Deploy to staging", "technology":"SSH / Docker" }, + { "id":"r9", "source":"artifact_registry", "target":"web_app_staging", "label":"Pull verified image", "technology":"Docker" }, + { "id":"r10", "source":"web_app_staging", "target":"postgres_staging", "label":"Read / write", "technology":"SQL" }, + { "id":"r11", "source":"web_app_staging", "target":"object_storage_staging","label":"Files", "technology":"S3 API" }, + { "id":"r12", "source":"ci_runner", "target":"web_app", "label":"Deploy to production", "technology":"SSH / Docker" }, + { "id":"r13", "source":"artifact_registry", "target":"web_app", "label":"Pull verified image", "technology":"Docker" }, + { "id":"r14", "source":"user", "target":"cloudflare", "label":"HTTPS", "technology":"HTTPS", "note":"someapp.xx" }, + { "id":"r15", "source":"cloudflare", "target":"web_app", "label":"Reverse proxy · WAF-filtered", "technology":"HTTPS" }, + { "id":"r16", "source":"web_app", "target":"redis_cache", "label":"Cache", "technology":"Redis" }, + { "id":"r17", "source":"web_app", "target":"postgres", "label":"Read / write", "technology":"SQL" }, + { "id":"r18", "source":"web_app", "target":"object_storage", "label":"Files", "technology":"S3 API" }, + { "id":"r19", "source":"web_app", "target":"posthog_analytics", "label":"Track events", "technology":"HTTPS / PostHog SDK" }, + { "id":"r19a","source":"web_app", "target":"posthog_errors", "label":"Report exceptions", "technology":"HTTPS / PostHog SDK" }, + { "id":"r20", "source":"web_app", "target":"vault", "label":"Fetch runtime secrets" }, + { "id":"r21", "source":"web_app", "target":"monitoring", "label":"Metrics", "technology":"Prometheus" }, + { "id":"r22", "source":"web_app", "target":"log_server", "label":"App event logs" }, + { "id":"r23", "source":"postgres", "target":"backup_storage", "label":"Scheduled backup", "technology":"S3 API" }, + { "id":"r24", "source":"postgres", "target":"log_server", "label":"DB access + edit audit logs" }, + { "id":"r25", "source":"developer", "target":"monitoring", "label":"Admin access" }, + { "id":"r26", "source":"developer", "target":"log_server", "label":"Admin / audit access" }, + { "id":"r27", "source":"web_app", "target":"background_worker", "label":"Enqueue async job" }, + { "id":"r28", "source":"background_worker", "target":"postgres", "label":"Read / write", "technology":"SQL" }, + { "id":"r29", "source":"background_worker", "target":"object_storage", "label":"Files", "technology":"S3 API" }, + { "id":"r30", "source":"background_worker", "target":"vault", "label":"Fetch runtime secrets" }, + { "id":"r31", "source":"background_worker", "target":"monitoring", "label":"Metrics", "technology":"Prometheus" }, + { "id":"r32", "source":"background_worker", "target":"log_server", "label":"Worker event logs" }, + { "id":"r33", "source":"background_worker", "target":"posthog_errors", "label":"Report exceptions", "technology":"HTTPS / PostHog SDK" }, + { "id":"r34", "source":"web_app_staging", "target":"redis_cache_staging", "label":"Cache", "technology":"Redis" }, + { "id":"r35", "source":"web_app_staging", "target":"background_worker_staging","label":"Enqueue async job" }, + { "id":"r36", "source":"background_worker_staging","target":"postgres_staging", "label":"Read / write", "technology":"SQL" }, + { "id":"r37", "source":"background_worker_staging","target":"object_storage_staging","label":"Files", "technology":"S3 API" }, + { "id":"r38", "source":"web_app_staging", "target":"monitoring_staging", "label":"Metrics", "technology":"Prometheus" }, + { "id":"r39", "source":"background_worker_staging","target":"monitoring_staging", "label":"Metrics", "technology":"Prometheus" }, + { "id":"r40", "source":"web_app_staging", "target":"log_server_staging", "label":"App event logs" }, + { "id":"r41", "source":"background_worker_staging","target":"log_server_staging", "label":"Worker event logs" }, + { "id":"r42", "source":"postgres_staging", "target":"log_server_staging", "label":"DB audit logs" }, + { "id":"r43", "source":"ci_runner", "target":"background_worker_staging","label":"Deploy to staging", "technology":"SSH / Docker" }, + { "id":"r44", "source":"artifact_registry", "target":"background_worker_staging","label":"Pull verified image", "technology":"Docker" }, + { "id":"r45", "source":"ci_runner", "target":"background_worker", "label":"Deploy to production", "technology":"SSH / Docker" }, + { "id":"r46", "source":"artifact_registry", "target":"background_worker", "label":"Pull verified image", "technology":"Docker" } + ] +} diff --git a/data/29_it_3_ground_truth.MMD b/data/29_it_3_ground_truth.MMD new file mode 100644 index 0000000..90ab812 --- /dev/null +++ b/data/29_it_3_ground_truth.MMD @@ -0,0 +1,117 @@ +--- +config: + theme: default +--- +flowchart LR + user["User\n[Person]"] + developer["Developer\n[Person]"] + cloudflare["Cloudflare\n[External System]\nCDN + WAF"] + gitlab["GitLab\n[External System]\nSource Control / CI"] + infomaniak_api["Infomaniak API\n[External System]\nCloud Management"] + + subgraph posthog["PostHog"] + direction LR + posthog_analytics["Product Analytics\n[External System]\nEvent Tracking"] + posthog_errors["Error Tracking\n[External System]\nException Aggregation"] + end + + subgraph infomaniak["Infomaniak Public Cloud"] + direction LR + + subgraph cicd_env["CI/CD Environment"] + direction LR + ci_runner["GitLab CI Runner\n[Container]"] + terraform["Terraform\n[Container]\nInfrastructure as Code"] + test_suite["Test Suite\n[Container]\nAutomated Tests"] + vault["HashiCorp Vault\n[Container]\nSecrets Management"] + artifact_registry["Artifact Registry\n[Container]\nDocker Images"] + end + + subgraph staging_env["Staging"] + direction LR + web_app_staging["SomeApp\n[Container]\nStaging"] + background_worker_staging["Background Worker\n[Container]\nStaging"] + redis_cache_staging[("Redis Cache\n[Container]\nStaging")] + postgres_staging[("PostgreSQL\n[Container]\nStaging DB")] + object_storage_staging[("Object Storage\n[Container]\nStaging Bucket")] + monitoring_staging["Monitoring\n[Container]\nStaging Prometheus"] + log_server_staging[("Log Server\n[Container]\nStaging")] + end + + subgraph prod_env["Production"] + direction LR + web_app["SomeApp\n[Container]\nProduction"] + background_worker["Background Worker\n[Container]\nProduction"] + redis_cache[("Redis Cache\n[Container]")] + postgres[("PostgreSQL\n[Container]\nPrimary DB")] + object_storage[("Object Storage\n[Container]\nPrimary Bucket")] + backup_storage[("Backup Storage\n[Container]\nBackup Bucket")] + monitoring["Monitoring\n[Container]\nPrometheus + Grafana"] + log_server[("Log Server\n[Container]\nAdmin access only")] + end + end + + %% CI/CD pipeline + developer -->|"Push code · Git"| gitlab + gitlab -->|"Trigger pipeline · GitLab CI"| ci_runner + ci_runner -->|"Execute tests"| test_suite + ci_runner -->|"Fetch deploy secrets"| vault + ci_runner -->|"Run plan / apply"| terraform + terraform -->|"Provision infra · REST API"| infomaniak_api + ci_runner -->|"Push image · Docker"| artifact_registry + + %% Deploy to staging + ci_runner -->|"Deploy to staging · SSH"| web_app_staging + ci_runner -->|"Deploy to staging · SSH"| background_worker_staging + artifact_registry -->|"Pull verified image"| web_app_staging + artifact_registry -->|"Pull verified image"| background_worker_staging + + %% Staging app internals + web_app_staging -->|"Cache · Redis"| redis_cache_staging + web_app_staging -->|"Read / write · SQL"| postgres_staging + web_app_staging -->|"Files · S3"| object_storage_staging + web_app_staging -->|"Enqueue async job"| background_worker_staging + background_worker_staging -->|"Read / write · SQL"| postgres_staging + background_worker_staging -->|"Files · S3"| object_storage_staging + web_app_staging -->|"Metrics · Prometheus"| monitoring_staging + background_worker_staging -->|"Metrics · Prometheus"| monitoring_staging + web_app_staging -->|"App event logs"| log_server_staging + background_worker_staging -->|"Worker event logs"| log_server_staging + postgres_staging -->|"DB audit logs"| log_server_staging + + %% Deploy to production + ci_runner -->|"Deploy to production · SSH"| web_app + ci_runner -->|"Deploy to production · SSH"| background_worker + artifact_registry -->|"Pull verified image"| web_app + artifact_registry -->|"Pull verified image"| background_worker + + %% Production user path + user -->|"HTTPS · someapp.xx"| cloudflare + cloudflare -->|"Reverse proxy · WAF-filtered"| web_app + + %% Production app internals + web_app -->|"Cache · Redis"| redis_cache + web_app -->|"Read / write · SQL"| postgres + web_app -->|"Files · S3"| object_storage + web_app -->|"Enqueue async job"| background_worker + web_app -->|"Track events · PostHog SDK"| posthog_analytics + web_app -->|"Report exceptions · PostHog SDK"| posthog_errors + web_app -->|"Fetch runtime secrets"| vault + web_app -->|"Metrics · Prometheus"| monitoring + web_app -->|"App event logs"| log_server + + %% Production background worker + background_worker -->|"Read / write · SQL"| postgres + background_worker -->|"Files · S3"| object_storage + background_worker -->|"Fetch runtime secrets"| vault + background_worker -->|"Metrics · Prometheus"| monitoring + background_worker -->|"Worker event logs"| log_server + background_worker -->|"Report exceptions · PostHog SDK"| posthog_errors + + %% DB backup + audit + postgres -->|"Scheduled backup · S3"| backup_storage + postgres -->|"DB access + edit audit logs"| log_server + + %% Developer admin access + developer -->|"Admin access"| monitoring + developer -->|"Admin / audit access"| log_server \ No newline at end of file diff --git a/data/30_it_3.JSON b/data/30_it_3.JSON new file mode 100644 index 0000000..d5370fc --- /dev/null +++ b/data/30_it_3.JSON @@ -0,0 +1,280 @@ +{ + "metadata": { + "id": "it_3_30", + "diagram_type": "c4_container", + "tier": 3, + "entity_count": 26, + "container_count": 4, + "attachment_count": 0, + "description": "IoT edge and stream processing platform. Field sensors publish data via an edge gateway and MQTT broker into a Kafka event stream. A schema registry governs Kafka payload contracts. A data validator cleanses incoming events before the stream processor writes metrics to a time-series database and a Redis hot cache. A rule engine and an ML anomaly detector both feed an alert manager that routes notifications via an external gateway to the maintenance team. Cold data is archived to cloud storage; the relational DB is backed up to a dedicated bucket. An OTA service pushes firmware and config updates to field devices. A central secret manager vends device certificates and service credentials, and a central IAM service authenticates operators and services. An operator manages the platform via a REST API and Grafana dashboard. All data access is logged to an internal audit log." + }, + "system_boundary": { + "id": "iot_platform", + "name": "IoT & Edge Processing Platform", + "type": "deployment_environment" + }, + "elements": [ + { + "id": "iot_sensors", + "name": "IoT Sensors", + "type": "external_system", + "description": "Field instruments and smart devices (e.g., temperature, pressure, energy meters). Publish telemetry to the edge gateway using MQTT or CoAP" + }, + { + "id": "operator", + "name": "Operator", + "type": "person", + "description": "Platform administrator. Manages device configuration, monitors dashboards, and queries the REST API" + }, + { + "id": "maintenance_team", + "name": "Maintenance Team", + "type": "person", + "description": "Field technicians and on-call engineers who receive alert notifications when thresholds are breached or anomalies are detected" + }, + { + "id": "cloud_archive", + "name": "Cloud Archive", + "type": "external_system", + "description": "Long-term cold storage (e.g., AWS S3 / Azure Blob). Receives periodic exports of historical time-series data from the time-series database for compliance and long-term analysis" + }, + { + "id": "notification_gateway", + "name": "Notification Gateway", + "type": "external_system", + "description": "External alert delivery service (e.g., PagerDuty, SMS gateway, or Slack integration). Routes alert payloads from the alert manager to the maintenance team" + }, + { + "id": "edge_layer", + "name": "Edge Layer", + "type": "deployment_environment", + "boundary": "iot_platform", + "description": "Physical edge infrastructure co-located with field devices. Aggregates, normalises, and buffers sensor data before forwarding to the processing layer" + }, + { + "id": "edge_gateway", + "name": "Edge Gateway", + "type": "container", + "technology": "IoT Edge Gateway", + "boundary": "edge_layer", + "description": "Protocol-translation gateway. Accepts MQTT and CoAP messages from IoT sensors, authenticates devices against the device registry, normalises payloads, and publishes to the MQTT broker. Buffers to local storage if connectivity to the processing layer is lost" + }, + { + "id": "mqtt_broker", + "name": "MQTT Broker", + "type": "container", + "technology": "Mosquitto / EMQX", + "boundary": "edge_layer", + "description": "MQTT message broker. Receives normalised device messages from the edge gateway and forwards them to the Kafka event stream in the processing layer" + }, + { + "id": "device_registry", + "name": "Device Registry", + "type": "container", + "technology": "Device Identity Service", + "boundary": "edge_layer", + "description": "Device identity and authentication service. Stores device certificates, provisioning state, and registration metadata; persists records to the relational database" + }, + { + "id": "local_buffer", + "name": "Local Buffer", + "type": "container", + "technology": "Edge Data Store (SQLite / RocksDB)", + "boundary": "edge_layer", + "description": "Short-term local data buffer at the edge. Stores unforwarded messages when connectivity to the processing layer is interrupted; flushes to Kafka on reconnection" + }, + { + "id": "ota_service", + "name": "OTA Service", + "type": "container", + "technology": "Firmware / Config Update Service", + "boundary": "edge_layer", + "description": "Over-the-air update service. Pushes signed firmware images and configuration updates to field sensors via the edge gateway. Triggered by operators and looks up target devices in the device registry" + }, + { + "id": "processing_layer", + "name": "Processing Layer", + "type": "deployment_environment", + "boundary": "iot_platform", + "description": "Stream processing core. Validates, transforms, and analyses incoming events in real time before writing to storage and triggering alerts" + }, + { + "id": "kafka", + "name": "Kafka", + "type": "container", + "technology": "Apache Kafka", + "boundary": "processing_layer", + "description": "Distributed event streaming backbone. Receives messages from the MQTT broker and local buffer; fans them out to the data validator, stream processor, and other consumers" + }, + { + "id": "data_validator", + "name": "Data Validator", + "type": "container", + "technology": "Stream Validation Service", + "boundary": "processing_layer", + "description": "Consumes raw events from Kafka, validates schema and value ranges, enriches with device metadata, and publishes validated events back to Kafka for downstream processing" + }, + { + "id": "stream_processor", + "name": "Stream Processor", + "type": "container", + "technology": "Apache Flink / Kafka Streams", + "boundary": "processing_layer", + "description": "Real-time stream processing engine. Consumes validated events, computes aggregations and rolling statistics, writes metrics to the time-series database, caches recent readings in Redis, and routes events to the rule engine and anomaly detector" + }, + { + "id": "rule_engine", + "name": "Rule Engine", + "type": "container", + "technology": "Threshold / CEP Rules", + "boundary": "processing_layer", + "description": "Evaluates configurable threshold rules against processed metrics. Triggers the alert manager when defined conditions (e.g., temperature > 80°C, power > limit) are met" + }, + { + "id": "anomaly_detector", + "name": "Anomaly Detector", + "type": "container", + "technology": "ML Inference Service", + "boundary": "processing_layer", + "description": "Machine-learning model that detects statistical anomalies in sensor streams not covered by static threshold rules. Publishes anomaly events to the alert manager" + }, + { + "id": "alert_manager", + "name": "Alert Manager", + "type": "container", + "technology": "Alert Routing Service", + "boundary": "processing_layer", + "description": "Aggregates and deduplicates alerts from the rule engine and anomaly detector, applies escalation policies, and routes alert payloads to the external notification gateway" + }, + { + "id": "api_server", + "name": "API Server", + "type": "container", + "technology": "REST API", + "boundary": "processing_layer", + "description": "REST API for operator interactions. Supports device management, configuration updates, metric queries, and alert history retrieval. Logs all access to the audit log" + }, + { + "id": "schema_registry", + "name": "Schema Registry", + "type": "container", + "technology": "Confluent Schema Registry", + "boundary": "processing_layer", + "description": "Central registry of Kafka message schemas (Avro / Protobuf). Consulted by the data validator and stream processor to enforce payload contracts across producers and consumers" + }, + { + "id": "secret_manager", + "name": "Secret Manager", + "type": "container", + "technology": "HashiCorp Vault", + "boundary": "processing_layer", + "description": "Central secret store. Vends device certificates and provisioning tokens to the edge gateway and device registry, and service credentials (DB passwords, API keys) to the stream processor and API server" + }, + { + "id": "iam_service", + "name": "IAM Service", + "type": "container", + "technology": "Identity & Access Management", + "boundary": "processing_layer", + "description": "Central identity and access management service. Authenticates operators against the API server and dashboard and issues service tokens validated by the API server" + }, + { + "id": "storage_layer", + "name": "Storage Layer", + "type": "deployment_environment", + "boundary": "iot_platform", + "description": "Persistence layer. Stores time-series metrics, device metadata, cached recent readings, monitoring dashboards, and the compliance audit log" + }, + { + "id": "timeseries_db", + "name": "Time-Series DB", + "type": "container", + "technology": "InfluxDB / TimescaleDB", + "boundary": "storage_layer", + "description": "Primary time-series database for sensor metrics. Queried by the dashboard and API server; periodically exports cold data to the cloud archive" + }, + { + "id": "relational_db", + "name": "Relational DB", + "type": "container", + "technology": "PostgreSQL", + "boundary": "storage_layer", + "description": "Relational database for device metadata, configuration, alert rules, and alert history. Used by the device registry and API server" + }, + { + "id": "hot_cache", + "name": "Hot Cache", + "type": "container", + "technology": "Redis", + "boundary": "storage_layer", + "description": "In-memory cache of the most recent sensor readings per device. Queried by the dashboard and API server for low-latency current-state lookups" + }, + { + "id": "dashboard", + "name": "Dashboard", + "type": "container", + "technology": "Grafana", + "boundary": "storage_layer", + "description": "Real-time Grafana monitoring dashboard. Queries the time-series database for historical metrics and the hot cache for live readings. Accessible only to the operator" + }, + { + "id": "audit_log", + "name": "Audit Log", + "type": "container", + "technology": "Centralised Log Store", + "boundary": "storage_layer", + "description": "Internal compliance and access audit log. Receives data-processing events from the stream processor and access events from the API server. Accessible only via admin login" + }, + { + "id": "relational_db_backup", + "name": "Relational DB Backup", + "type": "container", + "technology": "S3-compatible Backup Bucket", + "boundary": "storage_layer", + "description": "Dedicated backup bucket for the relational database. Receives scheduled snapshots of device metadata, alert rules, and configuration" + } + ], + "relationships": [ + { "id":"r1", "source":"iot_sensors", "target":"edge_gateway", "label":"Sensor data · MQTT/CoAP" }, + { "id":"r2", "source":"edge_gateway", "target":"device_registry", "label":"Device auth" }, + { "id":"r3", "source":"edge_gateway", "target":"mqtt_broker", "label":"Normalised MQTT" }, + { "id":"r4", "source":"edge_gateway", "target":"local_buffer", "label":"Buffer on disconnect" }, + { "id":"r5", "source":"mqtt_broker", "target":"kafka", "label":"Publish events" }, + { "id":"r6", "source":"local_buffer", "target":"kafka", "label":"Flush on reconnect" }, + { "id":"r7", "source":"kafka", "target":"data_validator", "label":"Raw events" }, + { "id":"r8", "source":"data_validator", "target":"kafka", "label":"Validated events" }, + { "id":"r9", "source":"kafka", "target":"stream_processor", "label":"Consume validated events" }, + { "id":"r10", "source":"stream_processor", "target":"timeseries_db", "label":"Write metrics" }, + { "id":"r11", "source":"stream_processor", "target":"hot_cache", "label":"Cache recent readings · Redis" }, + { "id":"r12", "source":"stream_processor", "target":"rule_engine", "label":"Evaluate thresholds" }, + { "id":"r13", "source":"stream_processor", "target":"anomaly_detector", "label":"ML inference" }, + { "id":"r14", "source":"stream_processor", "target":"audit_log", "label":"Processing event log" }, + { "id":"r15", "source":"rule_engine", "target":"alert_manager", "label":"Threshold alert" }, + { "id":"r16", "source":"anomaly_detector", "target":"alert_manager", "label":"Anomaly alert" }, + { "id":"r17", "source":"alert_manager", "target":"notification_gateway","label":"Route alert" }, + { "id":"r18", "source":"notification_gateway","target":"maintenance_team", "label":"Alert notification" }, + { "id":"r19", "source":"device_registry", "target":"relational_db", "label":"Device metadata · SQL" }, + { "id":"r20", "source":"timeseries_db", "target":"cloud_archive", "label":"Cold data archival · S3" }, + { "id":"r21", "source":"operator", "target":"api_server", "label":"Manage / query" }, + { "id":"r22", "source":"operator", "target":"dashboard", "label":"Monitor" }, + { "id":"r23", "source":"api_server", "target":"timeseries_db", "label":"Query metrics · SQL" }, + { "id":"r24", "source":"api_server", "target":"relational_db", "label":"Query devices · SQL" }, + { "id":"r25", "source":"api_server", "target":"hot_cache", "label":"Live readings · Redis" }, + { "id":"r26", "source":"api_server", "target":"audit_log", "label":"Access log" }, + { "id":"r27", "source":"dashboard", "target":"timeseries_db", "label":"Historical metrics" }, + { "id":"r28", "source":"dashboard", "target":"hot_cache", "label":"Live readings" }, + { "id":"r29", "source":"data_validator", "target":"schema_registry", "label":"Fetch schema" }, + { "id":"r30", "source":"stream_processor", "target":"schema_registry", "label":"Fetch schema" }, + { "id":"r31", "source":"edge_gateway", "target":"secret_manager", "label":"Fetch device certs" }, + { "id":"r32", "source":"device_registry", "target":"secret_manager", "label":"Fetch provisioning tokens" }, + { "id":"r33", "source":"stream_processor", "target":"secret_manager", "label":"Fetch credentials" }, + { "id":"r34", "source":"api_server", "target":"secret_manager", "label":"Fetch credentials" }, + { "id":"r35", "source":"operator", "target":"iam_service", "label":"Authenticate" }, + { "id":"r36", "source":"api_server", "target":"iam_service", "label":"Validate token" }, + { "id":"r37", "source":"dashboard", "target":"iam_service", "label":"Validate token" }, + { "id":"r38", "source":"operator", "target":"ota_service", "label":"Trigger firmware update" }, + { "id":"r39", "source":"ota_service", "target":"device_registry", "label":"Lookup target devices" }, + { "id":"r40", "source":"ota_service", "target":"edge_gateway", "label":"Push firmware / config" }, + { "id":"r41", "source":"relational_db", "target":"relational_db_backup","label":"Scheduled backup · S3" } + ] +} diff --git a/data/30_it_3_ground_truth.MMD b/data/30_it_3_ground_truth.MMD new file mode 100644 index 0000000..d554cdd --- /dev/null +++ b/data/30_it_3_ground_truth.MMD @@ -0,0 +1,104 @@ +--- +config: + theme: default +--- +flowchart LR + iot_sensors["IoT Sensors\n[External System]\nField Devices"] + operator["Operator\n[Person]"] + maintenance_team["Maintenance Team\n[Person]"] + cloud_archive["Cloud Archive\n[External System]\nS3 / Blob Storage"] + notification_gateway["Notification Gateway\n[External System]\nPagerDuty / SMS"] + + subgraph iot_platform["IoT & Edge Processing Platform"] + subgraph edge_layer["Edge Layer"] + direction LR + edge_gateway["Edge Gateway\n[Container]\nProtocol Translation"] + mqtt_broker["MQTT Broker\n[Container]\nMosquitto / EMQX"] + device_registry["Device Registry\n[Container]\nIdentity Service"] + local_buffer[("Local Buffer\n[Container]\nEdge Data Store")] + ota_service["OTA Service\n[Container]\nFirmware / Config Updates"] + end + + subgraph processing_layer["Processing Layer"] + direction LR + kafka["Kafka\n[Container]\nEvent Stream"] + data_validator["Data Validator\n[Container]"] + stream_processor["Stream Processor\n[Container]\nFlink / Kafka Streams"] + rule_engine["Rule Engine\n[Container]\nThreshold Rules"] + anomaly_detector["Anomaly Detector\n[Container]\nML Inference"] + alert_manager["Alert Manager\n[Container]"] + api_server["API Server\n[Container]\nREST API"] + schema_registry["Schema Registry\n[Container]\nAvro / Protobuf"] + secret_manager["Secret Manager\n[Container]\nHashiCorp Vault"] + iam_service["IAM Service\n[Container]\nIdentity & Access"] + end + + subgraph storage_layer["Storage Layer"] + direction LR + timeseries_db[("Time-Series DB\n[Container]\nInfluxDB")] + relational_db[("Relational DB\n[Container]\nPostgreSQL")] + hot_cache[("Hot Cache\n[Container]\nRedis")] + dashboard["Dashboard\n[Container]\nGrafana"] + audit_log[("Audit Log\n[Container]\nAdmin access only")] + relational_db_backup[("Relational DB Backup\n[Container]\nBackup Bucket")] + end + end + + %% Edge — device ingestion + iot_sensors -->|"Sensor data · MQTT/CoAP"| edge_gateway + edge_gateway -->|"Device auth"| device_registry + edge_gateway -->|"Normalised MQTT"| mqtt_broker + edge_gateway -->|"Buffer on disconnect"| local_buffer + + %% Edge → Kafka + mqtt_broker -->|"Publish events"| kafka + local_buffer -->|"Flush on reconnect"| kafka + + %% Processing pipeline + kafka -->|"Raw events"| data_validator + data_validator -->|"Validated events"| kafka + kafka -->|"Consume validated events"| stream_processor + stream_processor -->|"Write metrics"| timeseries_db + stream_processor -->|"Cache recent readings"| hot_cache + stream_processor -->|"Evaluate thresholds"| rule_engine + stream_processor -->|"ML inference"| anomaly_detector + stream_processor -->|"Processing event log"| audit_log + + %% Schema governance + data_validator -->|"Fetch schema"| schema_registry + stream_processor -->|"Fetch schema"| schema_registry + + %% Alerting + rule_engine -->|"Threshold alert"| alert_manager + anomaly_detector -->|"Anomaly alert"| alert_manager + alert_manager -->|"Route alert"| notification_gateway + notification_gateway -->|"Alert notification"| maintenance_team + + %% Storage + device_registry -->|"Device metadata · SQL"| relational_db + timeseries_db -->|"Cold data archival · S3"| cloud_archive + relational_db -->|"Scheduled backup · S3"| relational_db_backup + + %% Operator + operator -->|"Authenticate"| iam_service + operator -->|"Manage / query"| api_server + operator -->|"Monitor"| dashboard + operator -->|"Trigger firmware update"| ota_service + api_server -->|"Validate token"| iam_service + dashboard -->|"Validate token"| iam_service + api_server -->|"Query metrics · SQL"| timeseries_db + api_server -->|"Query devices · SQL"| relational_db + api_server -->|"Live readings · Redis"| hot_cache + api_server -->|"Access log"| audit_log + dashboard -->|"Historical metrics"| timeseries_db + dashboard -->|"Live readings"| hot_cache + + %% Secret management + edge_gateway -->|"Fetch device certs"| secret_manager + device_registry -->|"Fetch provisioning tokens"| secret_manager + stream_processor -->|"Fetch credentials"| secret_manager + api_server -->|"Fetch credentials"| secret_manager + + %% OTA flow + ota_service -->|"Lookup target devices"| device_registry + ota_service -->|"Push firmware / config"| edge_gateway \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index b6dfe53..d4b7e6a 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -97,6 +97,10 @@ select = ["E", "F", "I", "W"] # importing strategy modules — those imports must see the configured # environment, so they cannot move to the top of the file. "src/maestro/run.py" = ["E402"] +# experiment_config.py is a data registry: each InputFile carries a one-line +# human description of the diagram. These are intentionally long prose strings +# where wrapping would hurt readability, so line length is not enforced here. +"src/maestro/experiment_config.py" = ["E501"] [tool.pytest.ini_options] testpaths = ["tests"] diff --git a/src/maestro/analysis/metrics.py b/src/maestro/analysis/metrics.py index da4a38a..d595f6c 100644 --- a/src/maestro/analysis/metrics.py +++ b/src/maestro/analysis/metrics.py @@ -107,106 +107,173 @@ def _lemmatize_label(label: str) -> str: # --------------------------------------------------------------------------- +# Mermaid keywords that are syntax, never node ids. +_SKIP = { + "graph", + "flowchart", + "subgraph", + "end", + "direction", + "style", + "classdef", + "class", + "linkstyle", + "click", +} + +# A node definition: an id, an opening shape bracket, a label that is EITHER a +# quoted string (consumed whole — so brackets/newlines INSIDE a label such as +# "Web App\n[Device]\nLaptops (WiFi)" cannot spawn phantom nodes) or unquoted +# text up to the closing bracket, then the closing bracket(s). An empty label +# ("" or '') is allowed, so nodes like gw{""} are still captured. +_NODE_DEF = re.compile( + r"(\w+)\s*" # 1: node id + r"[\[\(\{]+" # opening bracket(s): [ ( { ([ [[ (( {{ [( {{ + r'(?:"([^"]*)"|\'([^\']*)\'|([^"\'\]\)\}|]*?))' # 2/3/4: quoted or unquoted label + r"\s*[\]\)\}]+" # closing bracket(s) +) + +# Edge label between pipes, e.g. -->|"Green (no risk)"| — stripped before node +# scanning so its text is never mistaken for a node definition. +_PIPE_LABEL = re.compile(r"\|[^|]*\|") + +# Subgraph (container) header: subgraph id OR subgraph id["Label"] +_SUBGRAPH = re.compile( + r'^\s*subgraph\s+(\w+)\s*(?:\[\s*"?([^"\]]*)"?\s*\])?', re.MULTILINE +) + +# Edge operators. Order in the alternation matters (longest / bidirectional +# first). o--o / --o / --x are association/attachment edges, NOT flow edges. +_EDGE = re.compile( + r"(\w+)\s*" + r"(<-\.->|<-->|-\.->|o--o|--o|--x|-->)" + r"\s*(?:\|[^|]*\|)?\s*" + r"(\w+)" +) +# Inline dot-delimited label form: source -. some text .-> target (message flow) +_EDGE_DOTLABEL = re.compile(r"(\w+)\s+-\.[^.|>]*\.->\s*(\w+)") +# Attachment / association edge: host o--o event (undirected, o-ended) +_ATTACH = re.compile(r"(\w+)\s*o--o\s*(?:\|[^|]*\|)?\s*(\w+)") + + +def _iter_node_defs(mermaid_code: str): + """ + Yield (id, label) for every node definition. Robust to labels that contain + brackets/newlines and to nodes defined inline on an edge line (e.g. + ``host o--o evt(("Label"))``). Skips comment lines and edge-label text. + """ + for raw in mermaid_code.splitlines(): + line = raw.strip() + if not line or line.startswith("%%"): + continue + # remove |edge labels| so their words can't be read as node defs + line = _PIPE_LABEL.sub(" ", line) + for m in _NODE_DEF.finditer(line): + nid = m.group(1) + if nid.lower() in _SKIP: + continue + label = (m.group(2) or m.group(3) or m.group(4) or "").strip() + yield nid, label + + +def extract_containers(mermaid_code: str) -> list[dict]: + """ + Extract subgraph containers (pools / lanes / boundaries / expanded + sub-processes). Scored as a separate dimension from entities. + Returns list of {"id": str, "label": str}. + """ + containers = [] + seen = set() + for m in _SUBGRAPH.finditer(mermaid_code): + cid = m.group(1) + if cid not in seen: + containers.append({"id": cid, "label": (m.group(2) or "").strip()}) + seen.add(cid) + return containers + + def extract_nodes(mermaid_code: str) -> list[dict]: """ - Extract node definitions from Mermaid code. + Extract ENTITY definitions (inline nodes) from Mermaid code. Returns list of {"id": str, "label": str}. - Handles two common formats: - - Standalone: node_id["Label"] or node_id(Label) - - Inline: node_id(Label) --> other_id(Other Label) + Per the scoring contract, an entity is a node drawn inline; a node drawn as + a ``subgraph`` is a container (see ``extract_containers``) and is excluded + here so it does not inflate the entity metric or the complexity tiers. """ + container_ids = {c["id"] for c in extract_containers(mermaid_code)} nodes = [] - seen_ids = set() - - # Keywords to skip — these are Mermaid syntax, not nodes - SKIP = {"graph", "flowchart", "subgraph", "end", "direction", "style", "classDef"} - - # Inline node pattern: matches id(Label), id["Label"], id[Label], id{Label} - # Works anywhere in a line — catches nodes inside arrow chains - inline_pattern = re.compile( - r"(\w+)\s*" # node id - r"[\[\(\{]+" # opening bracket(s) - r'["\']?\s*' # optional quote - r'([^"\]\)\}]+?)' # label (non-greedy) - r'\s*["\']?' # optional closing quote - r"[\]\)\}]+" # closing bracket(s) - ) - - for line in mermaid_code.splitlines(): - for match in inline_pattern.finditer(line): - node_id = match.group(1) - label = match.group(2).strip().strip('"').strip("'") - if node_id.lower() not in SKIP and node_id not in seen_ids: - nodes.append({"id": node_id, "label": label}) - seen_ids.add(node_id) - - # Subgraph definitions: subgraph id["Label"] - subgraph_pattern = r'subgraph\s+(\w+)\s*\["?([^"\]]*)"?\]' - for match in re.finditer(subgraph_pattern, mermaid_code): - sg_id = match.group(1) - if sg_id not in seen_ids: - nodes.append({"id": sg_id, "label": match.group(2).strip()}) - seen_ids.add(sg_id) - + seen = set() + for nid, label in _iter_node_defs(mermaid_code): + if nid in container_ids or nid in seen: + continue + nodes.append({"id": nid, "label": label}) + seen.add(nid) return nodes def extract_relationships(mermaid_code: str) -> list[dict]: """ - Extract relationship definitions from Mermaid code. + Extract flow relationships from Mermaid code. Returns list of {"source": str, "target": str, "type": str}. - Handles multiple arrow and label formats. + + Rules: + - ``-->`` directed sequence_flow. + - ``-.->`` directed message_flow (dotted), also ``-. label .->``. + - ``<-->`` / ``<-.->`` one UNDIRECTED relationship — endpoints are + canonicalised (sorted) so orientation does not matter when matching. + - ``o--o`` / ``--o`` / ``--x`` attachment / association edges — NOT + relationships; excluded here and scored via ``extract_attachments``. """ relationships = [] - - patterns = [ - # source -->|label| target | source -.->|label| target | source --> target - r"(\w+)\s+(-->|-.->)\s*(?:\|([^|]*)\|)?\s*(\w+)", - # Format: source -.label.-> target (inline dot-delimited label) - r"(\w+)\s+-\..*?\.->?\s*(\w+)", - ] - seen = set() - # Pattern 1: standard arrows with optional pipe labels - for match in re.finditer(patterns[0], mermaid_code): - source = match.group(1) - arrow = match.group(2) - label = match.group(3) or "" - target = match.group(4) - # Determine type: message_flow if arrow uses dots OR label contains "message" - is_message = "-." in arrow or "message" in label.lower() - rel_type = "message_flow" if is_message else "sequence_flow" - key = (source, target) - if key not in seen: - relationships.append( - { - "source": source, - "target": target, - "type": rel_type, - } - ) - seen.add(key) - - # Pattern 2: inline label between dots e.g. -.Message Flow 1.-> - for match in re.finditer(patterns[1], mermaid_code): - source = match.group(1) - target = match.group(2) - key = (source, target) + def _add(src: str, tgt: str, rel_type: str, undirected: bool = False) -> None: + if undirected: + src, tgt = sorted((src, tgt)) + key = (src, tgt) if key not in seen: - relationships.append( - { - "source": source, - "target": target, - "type": "message_flow", - } - ) seen.add(key) + relationships.append({"source": src, "target": tgt, "type": rel_type}) + + for raw in mermaid_code.splitlines(): + line = raw.strip() + if not line or line.startswith("%%"): + continue + for m in _EDGE.finditer(line): + src, op, tgt = m.group(1), m.group(2), m.group(3) + if op in ("o--o", "--o", "--x"): + continue # attachment / association, not a flow relationship + undirected = op.startswith("<") + dotted = "." in op + _add(src, tgt, "message_flow" if dotted else "sequence_flow", undirected) + for m in _EDGE_DOTLABEL.finditer(line): + _add(m.group(1), m.group(2), "message_flow") return relationships +def extract_attachments(mermaid_code: str) -> list[dict]: + """ + Extract attachment / compensation-association edges (``host o--o event``). + Undirected: endpoints are canonicalised (sorted). Scored as its own + dimension, separate from flow relationships. + Returns list of {"a": str, "b": str}. + """ + attachments = [] + seen = set() + for raw in mermaid_code.splitlines(): + line = raw.strip() + if not line or line.startswith("%%"): + continue + for m in _ATTACH.finditer(line): + a, b = sorted((m.group(1), m.group(2))) + if (a, b) not in seen: + seen.add((a, b)) + attachments.append({"a": a, "b": b}) + return attachments + + # --------------------------------------------------------------------------- # Scoring helpers # --------------------------------------------------------------------------- @@ -339,6 +406,55 @@ def compute_relationship_metrics_strict( return (precision, recall, _f1(precision, recall)) +# --------------------------------------------------------------------------- +# Container metrics (subgraphs: pools / lanes / boundaries / expanded subprocs) +# --------------------------------------------------------------------------- + + +def compute_container_metrics( + output_containers: list[dict], truth_containers: list[dict] +) -> tuple | None: + """ + Score containers as a separate dimension. Returns + (id_p, id_r, id_f1, name_p, name_r, name_f1) or ``None`` when the ground + truth has no containers (metric not applicable for this diagram). + + Reuses the entity matchers: containers are {"id", "label"} dicts, so exact + ID and fuzzy name matching apply unchanged. + """ + if not truth_containers: + return None + id_p, id_r, id_f1 = compute_entity_metrics_exact( + output_containers, truth_containers + ) + nm_p, nm_r, nm_f1 = compute_entity_metrics_fuzzy( + output_containers, truth_containers + ) + return (id_p, id_r, id_f1, nm_p, nm_r, nm_f1) + + +# --------------------------------------------------------------------------- +# Attachment metrics (o--o edges: boundary attachments + compensation assocs) +# --------------------------------------------------------------------------- + + +def compute_attachment_metrics( + output_attachments: list[dict], truth_attachments: list[dict] +) -> tuple | None: + """ + Score attachment edges as undirected pairs. Returns (precision, recall, f1) + or ``None`` when the ground truth has no attachments (metric N/A). + """ + truth_pairs = {tuple(sorted((a["a"], a["b"]))) for a in truth_attachments} + if not truth_pairs: + return None + output_pairs = {tuple(sorted((a["a"], a["b"]))) for a in output_attachments} + correct = len(output_pairs & truth_pairs) + precision = round(correct / len(output_pairs), 4) if output_pairs else 0.0 + recall = round(correct / len(truth_pairs), 4) + return (precision, recall, _f1(precision, recall)) + + # --------------------------------------------------------------------------- # Error taxonomy counts # --------------------------------------------------------------------------- @@ -480,11 +596,15 @@ def evaluate_run( # 1. Structural validity parses_valid, parse_error = check_mermaid_valid(output_diagram_code) - # 2. Extract nodes and relationships + # 2. Extract nodes, containers, relationships, attachments output_nodes = extract_nodes(output_diagram_code) truth_nodes = extract_nodes(truth_code) + output_containers = extract_containers(output_diagram_code) + truth_containers = extract_containers(truth_code) output_relationships = extract_relationships(output_diagram_code) truth_relationships = extract_relationships(truth_code) + output_attachments = extract_attachments(output_diagram_code) + truth_attachments = extract_attachments(truth_code) # 3. Entity metrics — three levels id_p, id_r, id_f1 = compute_entity_metrics_exact(output_nodes, truth_nodes) @@ -505,6 +625,14 @@ def evaluate_run( output_relationships, truth_relationships ) + # 6. Container + attachment dimensions (None when truth has none -> N/A) + container = compute_container_metrics(output_containers, truth_containers) + c_id_p, c_id_r, c_id_f1, c_nm_p, c_nm_r, c_nm_f1 = ( + container if container is not None else (None,) * 6 + ) + attach = compute_attachment_metrics(output_attachments, truth_attachments) + a_p, a_r, a_f1 = attach if attach is not None else (None, None, None) + return MetricResult( run_id=run_id, parses_valid=parses_valid, @@ -536,4 +664,17 @@ def evaluate_run( extra_relationships=relationship_tax["extra"], false_relationships=relationship_tax["false"], duplicate_relationships=relationship_tax["duplicate"], + container_id_precision=c_id_p, + container_id_recall=c_id_r, + container_id_f1=c_id_f1, + container_name_precision=c_nm_p, + container_name_recall=c_nm_r, + container_name_f1=c_nm_f1, + containers_in_output=len(output_containers), + containers_in_truth=len(truth_containers), + attachment_precision=a_p, + attachment_recall=a_r, + attachment_f1=a_f1, + attachments_in_output=len(output_attachments), + attachments_in_truth=len(truth_attachments), ) diff --git a/src/maestro/db/client.py b/src/maestro/db/client.py index 3620814..b48b95c 100644 --- a/src/maestro/db/client.py +++ b/src/maestro/db/client.py @@ -96,6 +96,23 @@ extra_relationships INTEGER NOT NULL, false_relationships INTEGER NOT NULL, duplicate_relationships INTEGER NOT NULL, + -- Container dimension (subgraphs). P/R/F1 nullable: NULL = no containers + -- in the ground truth (metric not applicable for that diagram). + container_id_precision REAL, + container_id_recall REAL, + container_id_f1 REAL, + container_name_precision REAL, + container_name_recall REAL, + container_name_f1 REAL, + containers_in_output INTEGER NOT NULL DEFAULT 0, + containers_in_truth INTEGER NOT NULL DEFAULT 0, + -- Attachment dimension (o--o edges). P/R/F1 nullable: NULL = no + -- attachments in the ground truth. + attachment_precision REAL, + attachment_recall REAL, + attachment_f1 REAL, + attachments_in_output INTEGER NOT NULL DEFAULT 0, + attachments_in_truth INTEGER NOT NULL DEFAULT 0, FOREIGN KEY (run_id) REFERENCES run_configs(run_id) ); """ @@ -117,6 +134,7 @@ def init_db(db_path: Path) -> None: conn.executescript(SCHEMA) _migrate_add_environment_id_column(conn) _migrate_add_retry_count_column(conn) + _migrate_add_container_attachment_columns(conn) conn.commit() @@ -150,6 +168,34 @@ def _migrate_add_retry_count_column(conn: sqlite3.Connection) -> None: ) +def _migrate_add_container_attachment_columns(conn: sqlite3.Connection) -> None: + """ + Add the container + attachment metric columns to databases that predate + them (Phase 3b). Each is added only if missing. Nullable REAL columns get + no default (NULL = metric not applicable); count columns default to 0. + Old rows keep NULL P/R/F1 and 0 counts; no backfill is attempted. + """ + cols = {row[1] for row in conn.execute("PRAGMA table_info(metric_results)")} + additions = [ + ("container_id_precision", "REAL"), + ("container_id_recall", "REAL"), + ("container_id_f1", "REAL"), + ("container_name_precision", "REAL"), + ("container_name_recall", "REAL"), + ("container_name_f1", "REAL"), + ("containers_in_output", "INTEGER NOT NULL DEFAULT 0"), + ("containers_in_truth", "INTEGER NOT NULL DEFAULT 0"), + ("attachment_precision", "REAL"), + ("attachment_recall", "REAL"), + ("attachment_f1", "REAL"), + ("attachments_in_output", "INTEGER NOT NULL DEFAULT 0"), + ("attachments_in_truth", "INTEGER NOT NULL DEFAULT 0"), + ] + for name, decl in additions: + if name not in cols: + conn.execute(f"ALTER TABLE metric_results ADD COLUMN {name} {decl}") + + @contextmanager def get_connection(db_path: Path): """ diff --git a/src/maestro/db/queries.py b/src/maestro/db/queries.py index f04feb6..5b8b10b 100644 --- a/src/maestro/db/queries.py +++ b/src/maestro/db/queries.py @@ -305,14 +305,21 @@ def insert_metric_result(conn: sqlite3.Connection, metric: MetricResult) -> None missing_entities, extra_entities, false_entities, duplicate_entities, missing_relationships, extra_relationships, - false_relationships, duplicate_relationships) + false_relationships, duplicate_relationships, + container_id_precision, container_id_recall, container_id_f1, + container_name_precision, container_name_recall, container_name_f1, + containers_in_output, containers_in_truth, + attachment_precision, attachment_recall, attachment_f1, + attachments_in_output, attachments_in_truth) -- Placeholder groupings mirror the column groupings above so a -- visual scan can spot any bind-parameter misalignment: -- 4 (ids+parse) · 3 (id) · 3 (name) · 3 (lemma) · -- 3 (rel relaxed) · 3 (rel strict) · -- 2 (ent counts) · 2 (rel counts) · -- 2 (missing/extra ent) · 2 (false/dup ent) · - -- 2 (missing/extra rel) · 2 (false/dup rel) = 31 + -- 2 (missing/extra rel) · 2 (false/dup rel) · + -- 3 (container id) · 3 (container name) · 2 (container counts) · + -- 3 (attachment) · 2 (attachment counts) = 44 VALUES (?, ?, ?, ?, ?, ?, ?, @@ -325,6 +332,11 @@ def insert_metric_result(conn: sqlite3.Connection, metric: MetricResult) -> None ?, ?, ?, ?, ?, ?, + ?, ?, + ?, ?, ?, + ?, ?, ?, + ?, ?, + ?, ?, ?, ?, ?) """, ( @@ -359,5 +371,18 @@ def insert_metric_result(conn: sqlite3.Connection, metric: MetricResult) -> None metric.extra_relationships, metric.false_relationships, metric.duplicate_relationships, + metric.container_id_precision, + metric.container_id_recall, + metric.container_id_f1, + metric.container_name_precision, + metric.container_name_recall, + metric.container_name_f1, + metric.containers_in_output, + metric.containers_in_truth, + metric.attachment_precision, + metric.attachment_recall, + metric.attachment_f1, + metric.attachments_in_output, + metric.attachments_in_truth, ), ) diff --git a/src/maestro/experiment_config.py b/src/maestro/experiment_config.py index b1da79c..9880132 100644 --- a/src/maestro/experiment_config.py +++ b/src/maestro/experiment_config.py @@ -24,23 +24,252 @@ # --------------------------------------------------------------------------- INPUTS: list[InputFile] = [ + # ── Tier 1 — BPMN (IDs 01–05, source: MIWG Category A / C) ───────────── InputFile( - example_id="bpmn_collaboration_01", + example_id="bpmn_1_01", + tier=Tier.SIMPLE, + entity_count=5, + file_path=DATA_DIR / "01_bpmn_1.JSON", + ground_truth_path=DATA_DIR / "01_bpmn_1_ground_truth.MMD", + description="Simple sequential process: Start → Task 1 → Task 2 → Task 3 → End (MIWG A.1.0)", + ), + InputFile( + example_id="bpmn_1_02", + tier=Tier.SIMPLE, + entity_count=8, + file_path=DATA_DIR / "02_bpmn_1.JSON", + ground_truth_path=DATA_DIR / "02_bpmn_1_ground_truth.MMD", + description="Single process with exclusive gateway split and merge — 3 parallel paths (MIWG A.2.0)", + ), + InputFile( + example_id="bpmn_1_03", + tier=Tier.SIMPLE, + entity_count=8, + file_path=DATA_DIR / "03_bpmn_1.JSON", + ground_truth_path=DATA_DIR / "03_bpmn_1_ground_truth.MMD", + description="Single process with exclusive gateway, default flows, and convergence (MIWG A.2.1)", + ), + InputFile( + example_id="bpmn_1_04", + tier=Tier.SIMPLE, + entity_count=10, + file_path=DATA_DIR / "04_bpmn_1.JSON", + ground_truth_path=DATA_DIR / "04_bpmn_1_ground_truth.MMD", + description="Process with collapsed sub-process and two boundary events (MIWG A.3.0)", + ), + InputFile( + example_id="bpmn_1_05", + tier=Tier.SIMPLE, + entity_count=9, + file_path=DATA_DIR / "05_bpmn_1.JSON", + ground_truth_path=DATA_DIR / "05_bpmn_1_ground_truth.MMD", + description="Process with intermediate events and branching flows (MIWG C.8.0)", + ), + # ── Tier 2 — BPMN (IDs 11–15, source: MIWG A.4.0 / C) ────────────────── + InputFile( + example_id="bpmn_2_11", tier=Tier.COMPLEX, entity_count=17, - file_path=DATA_DIR / "bpmn_collaboration_01.JSON", - ground_truth_path=DATA_DIR / "bpmn_collaboration_01_ground_truth.MMD", - description="BPMN collaboration diagram with pools, lanes, message flows", - ), - # --- Add new inputs below --- - # InputFile( - # example_id="simple_flow_01", - # tier=Tier.SIMPLE, - # entity_count=6, - # file_path=DATA_DIR / "simple_flow_01.JSON", - # ground_truth_path=DATA_DIR / "simple_flow_01_ground_truth.MMD", - # description="Simple sequential flowchart, no subprocesses", - # ), + file_path=DATA_DIR / "11_bpmn_2.JSON", + ground_truth_path=DATA_DIR / "11_bpmn_2_ground_truth.MMD", + description="Two-pool BPMN collaboration with message flows, lanes, and expanded sub-processes (MIWG A.4.0)", + ), + InputFile( + example_id="bpmn_2_12", + tier=Tier.COMPLEX, + entity_count=16, + file_path=DATA_DIR / "12_bpmn_2.JSON", + ground_truth_path=DATA_DIR / "12_bpmn_2_ground_truth.MMD", + description="Multi-pool collaboration with 4 lanes and message flows (MIWG C.1.0)", + ), + InputFile( + example_id="bpmn_2_13", + tier=Tier.COMPLEX, + entity_count=18, + file_path=DATA_DIR / "13_bpmn_2.JSON", + ground_truth_path=DATA_DIR / "13_bpmn_2_ground_truth.MMD", + description="Four-pool collaboration with complex cross-pool message flows (MIWG C.4.0)", + ), + InputFile( + example_id="bpmn_2_14", + tier=Tier.COMPLEX, + entity_count=20, + file_path=DATA_DIR / "14_bpmn_2.JSON", + ground_truth_path=DATA_DIR / "14_bpmn_2_ground_truth.MMD", + description="Single-pool process with 3 lanes, event-based gateways and timers (MIWG C.5.0)", + ), + InputFile( + example_id="bpmn_2_15", + tier=Tier.COMPLEX, + entity_count=16, + file_path=DATA_DIR / "15_bpmn_2.JSON", + ground_truth_path=DATA_DIR / "15_bpmn_2_ground_truth.MMD", + description="Single-pool process with parallel gateways and multiple end events (MIWG C.9.0)", + ), + # ── Tier 1 — IT Architecture (IDs 06–10) ──────────────────────────────── + InputFile( + example_id="it_1_06", + tier=Tier.SIMPLE, + entity_count=5, + file_path=DATA_DIR / "06_it_1.JSON", + ground_truth_path=DATA_DIR / "06_it_1_ground_truth.MMD", + description="SomeApp: web app on Infomaniak Public Cloud behind Cloudflare CDN with S3-compatible storage", + ), + InputFile( + example_id="it_1_07", + tier=Tier.SIMPLE, + entity_count=7, + file_path=DATA_DIR / "07_it_1.JSON", + ground_truth_path=DATA_DIR / "07_it_1_ground_truth.MMD", + description="SomeApp on Infomaniak: web app + PostgreSQL + object storage, with developer SSH access", + ), + InputFile( + example_id="it_1_08", + tier=Tier.SIMPLE, + entity_count=7, + file_path=DATA_DIR / "08_it_1.JSON", + ground_truth_path=DATA_DIR / "08_it_1_ground_truth.MMD", + description="Google Apps Script web app (OU-restricted, executes as deployer) with Code.gs + Index.html reading/writing data.json on Google Drive", + ), + InputFile( + example_id="it_1_09", + tier=Tier.SIMPLE, + entity_count=8, + file_path=DATA_DIR / "09_it_1.JSON", + ground_truth_path=DATA_DIR / "09_it_1_ground_truth.MMD", + description="GCP data analysis stack: IAP → Web App (MCP host) → Gemini + MCP Server → orders view on PostgreSQL", + ), + InputFile( + example_id="it_1_10", + tier=Tier.SIMPLE, + entity_count=9, + file_path=DATA_DIR / "10_it_1.JSON", + ground_truth_path=DATA_DIR / "10_it_1_ground_truth.MMD", + description="Small office network: router → firewall → switch → NAS, printer, VoIP phones (wired) + AP → laptops (WiFi)", + ), + # ── Tier 2 — IT Architecture (IDs 16–20) ──────────────────────────────── + InputFile( + example_id="it_2_16", + tier=Tier.COMPLEX, + entity_count=11, + file_path=DATA_DIR / "16_it_2.JSON", + ground_truth_path=DATA_DIR / "16_it_2_ground_truth.MMD", + description="SomeApp full delivery stack: GitLab CI + Terraform + test suite + runtime on Infomaniak behind Cloudflare", + ), + InputFile( + example_id="it_2_17", + tier=Tier.COMPLEX, + entity_count=14, + file_path=DATA_DIR / "17_it_2.JSON", + ground_truth_path=DATA_DIR / "17_it_2_ground_truth.MMD", + description="Expanded office network: adds ISP (explicit WAN edge), IP cameras + NVR, badge/access control, POS terminal to it_1_10", + ), + InputFile( + example_id="it_2_18", + tier=Tier.COMPLEX, + entity_count=13, + file_path=DATA_DIR / "18_it_2.JSON", + ground_truth_path=DATA_DIR / "18_it_2_ground_truth.MMD", + description="Extended GCP stack: adds Cloud Tasks + Background Worker, Cloud Storage, Secret Manager, Cloud Monitoring to it_1_09", + ), + InputFile( + example_id="it_2_19", + tier=Tier.COMPLEX, + entity_count=19, + file_path=DATA_DIR / "19_it_2.JSON", + ground_truth_path=DATA_DIR / "19_it_2_ground_truth.MMD", + description="Dual data center: active/standby load balancing with failover; each DC has DMZ (firewall + LB) and internal LAN (web app, auth/IAM, PostgreSQL); DB and IAM replicate across DCs via encrypted WAN", + ), + InputFile( + example_id="it_2_20", + tier=Tier.COMPLEX, + entity_count=14, + file_path=DATA_DIR / "20_it_2.JSON", + ground_truth_path=DATA_DIR / "20_it_2_ground_truth.MMD", + description="Hybrid cloud / on-premises: external users via cloud CDN/LB + VPN to on-prem app server (PostgreSQL, NFS, Active Directory); cloud layer provides VPN gateway, object storage, and monitoring", + ), + # ── Tier 3 — BPMN (IDs 21–25, source: MIWG B / C) ────────────────────── + InputFile( + example_id="bpmn_3_21", + tier=Tier.CROSS_LAYER, + entity_count=29, + file_path=DATA_DIR / "21_bpmn_3.JSON", + ground_truth_path=DATA_DIR / "21_bpmn_3_ground_truth.MMD", + description="Two-pool collaboration with lanes, mixed task types, collapsed/expanded sub-processes, 3 call activities, message start/end events, timer start, terminate end (MIWG B.1.0)", + ), + InputFile( + example_id="bpmn_3_22", + tier=Tier.CROSS_LAYER, + entity_count=29, + file_path=DATA_DIR / "22_bpmn_3.JSON", + ground_truth_path=DATA_DIR / "22_bpmn_3_ground_truth.MMD", + description="Four-pool e-commerce collaboration: Customer / Amazon (Picker + Packager lanes) / Carrier / Credit Card Company; 5 message flows, error boundary on Checkout sub-process (MIWG C.2.0)", + ), + InputFile( + example_id="bpmn_3_23", + tier=Tier.CROSS_LAYER, + entity_count=40, + file_path=DATA_DIR / "23_bpmn_3.JSON", + ground_truth_path=DATA_DIR / "23_bpmn_3_ground_truth.MMD", + description="Travel Booking process: event-based gateway, parallel gateways, compensation patterns (boundary + throw events), Make Booking sub-process, Handle Compensation sub-process, 6 send + 6 service tasks (MIWG C.6.0)", + ), + InputFile( + example_id="bpmn_3_24", + tier=Tier.CROSS_LAYER, + entity_count=23, + file_path=DATA_DIR / "24_bpmn_3.JSON", + ground_truth_path=DATA_DIR / "24_bpmn_3_ground_truth.MMD", + description="Manual Check process (expanded C.9.2): parallel fraud + risk check split, escalation gateway to Senior Reviewer, intermediate message catch for additional documents", + ), + InputFile( + example_id="bpmn_3_25", + tier=Tier.CROSS_LAYER, + entity_count=24, + file_path=DATA_DIR / "25_bpmn_3.JSON", + ground_truth_path=DATA_DIR / "25_bpmn_3_ground_truth.MMD", + description="Vacation Request process (expanded C.8.1): balance check + gateway before business-rule engine, HR Committee Review branch with intermediate message catch, new Insufficient Balance terminal", + ), + # ── Tier 3 — IT Architecture (IDs 26–30) ──────────────────────────────── + InputFile( + example_id="it_3_26", + tier=Tier.CROSS_LAYER, + entity_count=34, + file_path=DATA_DIR / "26_it_3.JSON", + ground_truth_path=DATA_DIR / "26_it_3_ground_truth.MMD", + description="Dual-DC active/standby LB with full IAM stack (IAM server + LDAP + token cache), external geofencing (zone + IP + telecom MFA), and Logging-as-a-Service for DB audit trail; extends it_2_19", + ), + InputFile( + example_id="it_3_27", + tier=Tier.CROSS_LAYER, + entity_count=32, + file_path=DATA_DIR / "27_it_3.JSON", + ground_truth_path=DATA_DIR / "27_it_3_ground_truth.MMD", + description="Two-office network (HQ + branch) with AWS hub: each office has router/fw/switch/VPN GW/NAS/NVR/cameras/POS/AP/clients; HQ adds access control + VoIP; AWS provides S3 NAS backup + S3 video archive via site-to-site VPN; remote users access via client VPN", + ), + InputFile( + example_id="it_3_28", + tier=Tier.CROSS_LAYER, + entity_count=25, + file_path=DATA_DIR / "28_it_3.JSON", + ground_truth_path=DATA_DIR / "28_it_3_ground_truth.MMD", + description="Extended GCP stack: Claude API replaces Vertex AI; external users enter via CDN+WAF+API Gateway with OAuth2; separate internal DB for external client data; vector store for RAG; event pipeline via Cloud Tasks+Pub/Sub+notification service; Cloud Scheduler for periodic jobs", + ), + InputFile( + example_id="it_3_29", + tier=Tier.CROSS_LAYER, + entity_count=25, + file_path=DATA_DIR / "29_it_3.JSON", + ground_truth_path=DATA_DIR / "29_it_3_ground_truth.MMD", + description="Full SomeApp stack: CI/CD env (GitLab runner + Terraform + tests + Vault + artifact registry), Staging env, Production env (web app + Redis + PostgreSQL + object storage + backup + Prometheus + log server); Cloudflare CDN/WAF; PostHog analytics", + ), + InputFile( + example_id="it_3_30", + tier=Tier.CROSS_LAYER, + entity_count=25, + file_path=DATA_DIR / "30_it_3.JSON", + ground_truth_path=DATA_DIR / "30_it_3_ground_truth.MMD", + description="IoT edge + stream processing platform: sensors → edge gateway + MQTT broker → Kafka → data validator + stream processor → time-series DB + Redis cache; rule engine + ML anomaly detector → alert manager → notification gateway; Grafana dashboard; audit log; cloud archival", + ), ] diff --git a/src/maestro/schemas.py b/src/maestro/schemas.py index 496204e..5644c33 100644 --- a/src/maestro/schemas.py +++ b/src/maestro/schemas.py @@ -302,3 +302,30 @@ class MetricResult(BaseModel): extra_relationships: int false_relationships: int duplicate_relationships: int + + # ------------------------------------------------------------------ + # Container metrics — pools / lanes / boundaries / expanded sub-processes + # (subgraphs). Scored as a SEPARATE dimension from entities so swimlane + # structure can be evaluated without polluting the entity metric/tiers. + # P/R/F1 are None when the ground truth has no containers (metric N/A), + # so aggregation can exclude those runs rather than averaging in a 0. + # ------------------------------------------------------------------ + container_id_precision: Optional[float] = None + container_id_recall: Optional[float] = None + container_id_f1: Optional[float] = None + container_name_precision: Optional[float] = None + container_name_recall: Optional[float] = None + container_name_f1: Optional[float] = None + containers_in_output: int = 0 + containers_in_truth: int = 0 + + # ------------------------------------------------------------------ + # Attachment metrics — BPMN boundary-event / compensation associations, + # drawn as ``o--o`` edges. Undirected pairs (orientation-insensitive). + # P/R/F1 are None when the ground truth has no attachments (metric N/A). + # ------------------------------------------------------------------ + attachment_precision: Optional[float] = None + attachment_recall: Optional[float] = None + attachment_f1: Optional[float] = None + attachments_in_output: int = 0 + attachments_in_truth: int = 0 diff --git a/tests/analysis/test_dataset_consistency.py b/tests/analysis/test_dataset_consistency.py new file mode 100644 index 0000000..0dd3c13 --- /dev/null +++ b/tests/analysis/test_dataset_consistency.py @@ -0,0 +1,145 @@ +""" +MAESTRO — Dataset JSON <-> ground-truth MMD consistency (Phase 5). + +For every registered input this asserts that the structured JSON (the model's +INPUT) and the reference Mermaid (the expected OUTPUT) agree on: + + * entities — JSON leaf nodes/elements == MMD inline nodes + * containers — JSON-derived groupings == MMD subgraphs + * flows — JSON sequence/message/rels == MMD flow edges (undirected) + * attachments — JSON attached_to + compensation == MMD ``o--o`` edges + * metadata — entity_count / container_count / attachment_count match + +Why this matters beyond data hygiene: container-ness is *derived from the +JSON's own containment fields* (``lane`` / ``pool`` / ``parent_subprocess`` for +BPMN, ``boundary`` for IT) plus the rule "a grouping is drawn iff something +nests inside it". If this test passes, every container in the expected output is +inferable from the input — i.e. the benchmark never asks a model to produce +structure the input didn't specify. A failure means either a ground-truth bug +or an under-specified input, both of which would silently penalise models. + +Conventions encoded here (the scoring contract): + * A pool's *sole* lane is subsumed by the pool and not drawn (a lane is a + container only if its pool has more than one lane). + * The outermost ``system_boundary`` is always drawn; it is a derived + container because its top-level zones reference it via ``boundary``. + * ``<-->`` and ``o--o`` are compared as undirected pairs. +""" + +from __future__ import annotations + +import json +from collections import defaultdict + +import pytest + +from maestro.analysis.metrics import ( + extract_attachments, + extract_containers, + extract_nodes, + extract_relationships, +) +from maestro.experiment_config import INPUTS + + +def _json_truth(d: dict) -> tuple[set, set, set, set]: + """Derive (entities, containers, flow_pairs, attachment_pairs) from the JSON.""" + dt = d["metadata"]["diagram_type"] + if dt.startswith("bpmn"): + node_ids = {n["id"] for n in d["nodes"]} + pool_ids = {p["id"] for p in d.get("participants", d.get("pools", []))} + lane_ids = {lane["id"] for lane in d.get("lanes", [])} + parents: set = set() + for n in d["nodes"]: + for field in ("lane", "pool", "parent_subprocess"): + if n.get(field): + parents.add(n[field]) + for lane in d.get("lanes", []): + if lane.get("pool"): + parents.add(lane["pool"]) + # A pool's sole lane is not drawn (subsumed by the pool). + lanes_by_pool: dict = defaultdict(list) + for lane in d.get("lanes", []): + lanes_by_pool[lane.get("pool")].append(lane["id"]) + sole_lanes = {ls[0] for ls in lanes_by_pool.values() if len(ls) == 1} + containers = (parents & (pool_ids | lane_ids | node_ids)) - sole_lanes + entities = node_ids - containers + rel = { + tuple(sorted((f["source"], f["target"]))) + for f in d.get("sequence_flows", []) + d.get("message_flows", []) + } + att = { + tuple(sorted((n["id"], n["attached_to"]))) + for n in d["nodes"] + if n.get("attached_to") + } + att |= { + tuple(sorted((c["source"], c["target"]))) + for c in d.get("compensation_associations", []) + } + else: # c4_container / network_topology + elem_ids = {e["id"] for e in d["elements"]} + # A grouping is a container iff some element nests inside it via + # ``boundary``. The outermost system_boundary is included because the + # top-level zones reference it. + containers = {e["boundary"] for e in d["elements"] if e.get("boundary")} + entities = elem_ids - containers + rel = { + tuple(sorted((r["source"], r["target"]))) + for r in d.get("relationships", []) + } + att = set() + return entities, containers, rel, att + + +def _mmd_truth(code: str) -> tuple[set, set, set, set]: + rels = extract_relationships(code) + return ( + {n["id"] for n in extract_nodes(code)}, + {c["id"] for c in extract_containers(code)}, + {tuple(sorted((r["source"], r["target"]))) for r in rels}, + {tuple(sorted((a["a"], a["b"]))) for a in extract_attachments(code)}, + ) + + +# One parametrised case per registered input — failures name the diagram. +@pytest.mark.parametrize("inp", INPUTS, ids=lambda i: i.example_id) +def test_json_mmd_structurally_consistent(inp): + d = json.loads(inp.file_path.read_text(encoding="utf-8")) + code = inp.ground_truth_path.read_text(encoding="utf-8") + je, jc, jr, ja = _json_truth(d) + me, mc, mr, ma = _mmd_truth(code) + + assert je == me, ( + f"{inp.example_id} ENTITIES differ — " + f"in JSON not MMD: {sorted(je - me)}; in MMD not JSON: {sorted(me - je)}" + ) + assert jc == mc, ( + f"{inp.example_id} CONTAINERS differ — " + f"in JSON not MMD: {sorted(jc - mc)}; in MMD not JSON: {sorted(mc - jc)}" + ) + assert jr == mr, ( + f"{inp.example_id} FLOWS differ — " + f"in JSON not MMD: {sorted(jr - mr)}; in MMD not JSON: {sorted(mr - jr)}" + ) + assert ja == ma, ( + f"{inp.example_id} ATTACHMENTS differ — " + f"in JSON not MMD: {sorted(ja - ma)}; in MMD not JSON: {sorted(ma - ja)}" + ) + + +@pytest.mark.parametrize("inp", INPUTS, ids=lambda i: i.example_id) +def test_metadata_counts_match_extractors(inp): + d = json.loads(inp.file_path.read_text(encoding="utf-8")) + code = inp.ground_truth_path.read_text(encoding="utf-8") + meta = d["metadata"] + assert meta["entity_count"] == len(extract_nodes(code)), ( + f"{inp.example_id} entity_count={meta['entity_count']} " + f"!= {len(extract_nodes(code))} inline nodes" + ) + assert meta.get("container_count") == len(extract_containers(code)), ( + f"{inp.example_id} container_count mismatch" + ) + assert meta.get("attachment_count") == len(extract_attachments(code)), ( + f"{inp.example_id} attachment_count mismatch" + ) diff --git a/tests/analysis/test_extraction.py b/tests/analysis/test_extraction.py new file mode 100644 index 0000000..0621f10 --- /dev/null +++ b/tests/analysis/test_extraction.py @@ -0,0 +1,235 @@ +""" +MAESTRO — Mermaid extraction unit tests (Phase 3a). + +Pure-function tests for the rewritten extractors in ``metrics.py``. They pin the +four bug-fixes from the scoring-pipeline audit and the entity/container/ +relationship/attachment split defined by the scoring contract: + + A1 ``<-->`` bidirectional edges are captured (one undirected pair). + A2 empty-label nodes (``gw{""}``) are captured, not silently dropped. + A3 phantom nodes from edge-label / multi-line-label / comment text are NOT + produced. + A4 ``o--o`` attachment edges are excluded from relationships and surfaced by + ``extract_attachments`` instead. + +Entities = inline nodes; containers = ``subgraph`` headers. These are unit tests +on the extractors only — no DB, no pydantic, no mmdc. +""" + +from __future__ import annotations + +from maestro.analysis.metrics import ( + compute_attachment_metrics, + compute_container_metrics, + extract_attachments, + extract_containers, + extract_nodes, + extract_relationships, +) + + +def _ids(nodes): + return {n["id"] for n in nodes} + + +def _pairs(rels): + return {(r["source"], r["target"]) for r in rels} + + +# --------------------------------------------------------------------------- +# Entity / container split +# --------------------------------------------------------------------------- + + +def test_subgraph_is_container_not_entity(): + code = """flowchart LR + subgraph pool_a["Team A"] + task_1["Do Thing"] + end + task_2["Other Thing"] + """ + assert _ids(extract_nodes(code)) == {"task_1", "task_2"} + assert _ids(extract_containers(code)) == {"pool_a"} + + +def test_collapsed_subprocess_inline_is_entity(): + # A collapsed sub-process renders inline [[ ]] -> entity (per contract B1). + code = 'flowchart LR\n sub_x[["Collapsed"]]\n a["A"]\n a --> sub_x\n' + assert "sub_x" in _ids(extract_nodes(code)) + assert _ids(extract_containers(code)) == set() + + +# --------------------------------------------------------------------------- +# A2 — empty / whitespace labels +# --------------------------------------------------------------------------- + + +def test_empty_label_node_extracted(): + code = 'flowchart LR\n gw{""}\n ev([""])\n a["A"]\n a --> gw\n' + ids = _ids(extract_nodes(code)) + assert {"gw", "ev", "a"} <= ids + + +def test_whitespace_label_node_extracted(): + code = 'flowchart LR\n gw{" "}\n a["A"]\n' + assert "gw" in _ids(extract_nodes(code)) + + +# --------------------------------------------------------------------------- +# A3 — phantom suppression +# --------------------------------------------------------------------------- + + +def test_no_phantom_from_pipe_edge_label(): + # "Green (no risk)" inside an edge label must not become a node "Green". + code = ( + "flowchart LR\n" + ' gw{"Risk?"}\n' + ' deliver["Deliver"]\n' + ' gw -->|"Green (no risk)"| deliver\n' + ) + ids = _ids(extract_nodes(code)) + assert ids == {"gw", "deliver"} + assert "Green" not in ids + + +def test_no_phantom_from_multiline_bracketed_label(): + # A quoted label containing [Device] and (WiFi) must be consumed whole. + code = ( + 'flowchart LR\n user_clients["User Clients\\n[Device]\\nLaptops (WiFi)"]\n' + ) + ids = _ids(extract_nodes(code)) + assert ids == {"user_clients"} + assert "nLaptops" not in ids and "Laptops" not in ids + + +def test_comment_lines_ignored(): + code = ( + "flowchart LR\n" + " %% Fraud path (expanded) with routing\n" + ' a["A"]\n' + ' b["B"]\n' + " a --> b\n" + ) + ids = _ids(extract_nodes(code)) + assert ids == {"a", "b"} + assert "routing" not in ids and "expanded" not in ids + + +def test_inline_on_edge_node_is_extracted(): + # A node defined inline on an edge line must still be captured. + code = 'flowchart LR\n host["Host"]\n host o--o evt(("Boundary"))\n' + assert {"host", "evt"} <= _ids(extract_nodes(code)) + + +# --------------------------------------------------------------------------- +# A1 — bidirectional edges +# --------------------------------------------------------------------------- + + +def test_bidirectional_edge_is_one_undirected_pair(): + code = 'flowchart LR\n a["A"]\n b["B"]\n a <-->|"IPsec"| b\n' + pairs = _pairs(extract_relationships(code)) + # canonicalised (sorted) — exactly one pair, orientation-independent + assert pairs == {("a", "b")} + + +def test_bidirectional_dotted_edge_is_message_flow(): + code = 'flowchart LR\n a["A"]\n b["B"]\n a <-.-> b\n' + rels = extract_relationships(code) + assert len(rels) == 1 + assert rels[0]["type"] == "message_flow" + assert (rels[0]["source"], rels[0]["target"]) == ("a", "b") + + +# --------------------------------------------------------------------------- +# A4 — o--o excluded from relationships, surfaced as attachments +# --------------------------------------------------------------------------- + + +def test_o_o_attachment_excluded_from_relationships(): + code = ( + "flowchart LR\n" + ' host["Host"]\n' + ' evt(("Boundary"))\n' + ' nxt["Next"]\n' + " host o--o evt\n" # attachment, NOT a relationship + " evt --> nxt\n" # real outgoing sequence flow + ) + pairs = _pairs(extract_relationships(code)) + assert ("host", "evt") not in pairs and ("evt", "host") not in pairs + assert ("evt", "nxt") in pairs + + +def test_extract_attachments_is_undirected_and_deduped(): + code = 'flowchart LR\n host["Host"]\n evt(("Boundary"))\n host o--o evt\n' + atts = extract_attachments(code) + assert len(atts) == 1 + assert tuple(sorted((atts[0]["a"], atts[0]["b"]))) == ("evt", "host") + + +def test_message_flow_dotted_arrow(): + code = 'flowchart LR\n a["A"]\n b["B"]\n a -.-> b\n' + rels = extract_relationships(code) + assert rels[0]["type"] == "message_flow" + + +def test_sequence_flow_solid_arrow(): + code = 'flowchart LR\n a["A"]\n b["B"]\n a --> b\n' + rels = extract_relationships(code) + assert rels[0]["type"] == "sequence_flow" + + +# --------------------------------------------------------------------------- +# 3b — container metrics (reuse entity matchers) +# --------------------------------------------------------------------------- + + +def test_container_metrics_none_when_no_truth_containers(): + assert compute_container_metrics([], []) is None + assert compute_container_metrics([{"id": "x", "label": "X"}], []) is None + + +def test_container_metrics_perfect_match(): + truth = [{"id": "pool_a", "label": "Team A"}] + result = compute_container_metrics(truth, truth) + assert result is not None + id_p, id_r, id_f1, nm_p, nm_r, nm_f1 = result + assert id_f1 == 1.0 and nm_f1 == 1.0 + + +def test_container_metrics_partial_recall(): + truth = [{"id": "p1", "label": "A"}, {"id": "p2", "label": "B"}] + out = [{"id": "p1", "label": "A"}] + _, id_r, _, _, _, _ = compute_container_metrics(out, truth) + assert id_r == 0.5 + + +# --------------------------------------------------------------------------- +# 3b — attachment metrics (undirected pairs) +# --------------------------------------------------------------------------- + + +def test_attachment_metrics_none_when_no_truth_attachments(): + assert compute_attachment_metrics([], []) is None + assert compute_attachment_metrics([{"a": "x", "b": "y"}], []) is None + + +def test_attachment_metrics_perfect_match(): + truth = [{"a": "host", "b": "evt"}] + p, r, f1 = compute_attachment_metrics(truth, truth) + assert (p, r, f1) == (1.0, 1.0, 1.0) + + +def test_attachment_metrics_orientation_insensitive(): + truth = [{"a": "host", "b": "evt"}] + out = [{"a": "evt", "b": "host"}] # reversed + p, r, f1 = compute_attachment_metrics(out, truth) + assert f1 == 1.0 + + +def test_attachment_metrics_partial_and_spurious(): + truth = [{"a": "h1", "b": "e1"}, {"a": "h2", "b": "e2"}] + out = [{"a": "h1", "b": "e1"}, {"a": "h9", "b": "e9"}] # 1 correct, 1 spurious + p, r, f1 = compute_attachment_metrics(out, truth) + assert p == 0.5 and r == 0.5 diff --git a/tests/analysis/test_metrics.py b/tests/analysis/test_metrics.py index 1cd6fde..99feba0 100644 --- a/tests/analysis/test_metrics.py +++ b/tests/analysis/test_metrics.py @@ -156,6 +156,56 @@ def test_sparse_output_scores_below_ground_truth(): assert 0.0 < metric.entity_id_recall < 1.0 +def _input(example_id: str): + """Locate a registered InputFile by example_id, or skip if absent.""" + for inp in INPUTS: + if inp.example_id == example_id: + return inp + pytest.skip(f"input {example_id} not registered") + + +def test_container_and_attachment_echo_perfect(): + """ + A ground truth WITH containers and attachments, echoed back, must score + F1=1.0 on both the container and attachment dimensions (Phase 3b). Uses + bpmn_3_23 (Travel Booking: one expanded sub-process container + boundary / + compensation o--o attachments). + """ + inp = _input("bpmn_3_23") + truth = inp.ground_truth_path.read_text(encoding="utf-8") + metric = evaluate_run( + run_id=uuid4(), + output_diagram_code=truth, + ground_truth_path=inp.ground_truth_path, + ) + assert metric.containers_in_truth > 0 and metric.attachments_in_truth > 0 + assert metric.container_id_f1 == 1.0 + assert metric.container_name_f1 == 1.0 + assert metric.attachment_f1 == 1.0 + assert metric.containers_in_output == metric.containers_in_truth + assert metric.attachments_in_output == metric.attachments_in_truth + + +def test_container_attachment_metrics_none_when_absent(): + """ + A ground truth with NO containers and NO attachments must report those + dimensions as None (metric not applicable), not 0.0 — so aggregation can + exclude the run rather than averaging in a spurious zero. Uses bpmn_1_01 + (a plain single-pool process). + """ + inp = _input("bpmn_1_01") + truth = inp.ground_truth_path.read_text(encoding="utf-8") + metric = evaluate_run( + run_id=uuid4(), + output_diagram_code=truth, + ground_truth_path=inp.ground_truth_path, + ) + assert metric.containers_in_truth == 0 and metric.attachments_in_truth == 0 + assert metric.container_id_f1 is None + assert metric.container_name_f1 is None + assert metric.attachment_f1 is None + + @pytest.mark.parametrize("diagram", ["", "flowchart LR\n", "not mermaid at all"]) def test_zero_entities_in_output_never_crashes(diagram: str): """