feat: dataset creation + adjustments to core to match data structures#59
Conversation
…coring Reworks the benchmark so the structured JSON input fully specifies the reference Mermaid output, fixes the scoring extractor, and adds container and attachment scoring dimensions. The dataset rework and the code changes it forces land together. Dataset (data/*.JSON, *_ground_truth.MMD) - Boundary attachments + #23 compensation links now use a dedicated `o--o` marker, recorded in JSON (attached_to / compensation_associations); they are no longer mis-scored as message flows. - #23 simplified: dropped the redundant 24h timer and the Handle Compensation sub-process; both error paths now compensate the Make Booking sub-process and then notify (fixes asymmetry + ordering). - system_boundary is now always drawn as the outermost subgraph (19, 26, 27, 30 wrapped); external actors stay outside it. - Added parent_subprocess/pool nesting links (#15, #23) so every container is derivable from the JSON's own containment fields. - entity_count corrected; container_count/attachment_count added. Complexity tiers rebalanced to 5/5/5 per category (BPMN, IT), all within band. - Renamed dataset files to NN_type ordering; removed the bpmn_collaboration_01 smoke-test pair. Scorer (src/maestro/analysis/metrics.py) - extract_nodes split into entities (inline) vs containers (subgraph). - `<-->` recognised as one undirected relationship; `o--o` excluded from flows and surfaced via extract_attachments. - Phantom nodes from edge-label / multi-line-label / comment text eliminated; empty-label nodes are captured. - New container + attachment metrics (P/R/F1, None when the truth has none so N/A runs are excluded from aggregation rather than averaged as 0). Schema / DB - MetricResult and the metric_results table gain container/attachment fields; an additive migration brings existing databases up to schema (old rows keep NULL P/R/F1 and 0 counts). Input registry - experiment_config paths fixed to the on-disk NN_type filenames; the benchmark previously could not resolve any input. example_id unchanged so DB keys stay stable. Tests / lint - New test_extraction (extractor bug-fixes), container/attachment metric tests, and test_dataset_consistency (a JSON<->MMD structural guarantee over every registered input). 119 passing; ruff clean.
|
Warning Review limit reached
More reviews will be available in 19 minutes and 3 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR extends the Maestro diagram evaluation framework to measure fidelity of diagram containers and attachments as new metric dimensions. It implements robust Mermaid extraction logic, extends the data model and persistence layer, and registers a complete 30-entry tier-rebalanced benchmark dataset across BPMN and IT architecture domains at three complexity tiers. ChangesMetrics Enhancement & Evaluation
Test Suite
Tier-Rebalanced Benchmark Dataset
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
data/16_it_2_ground_truth.MMD (1)
1-32:⚠️ Potential issue | 🟠 MajorFix
container_countmetadata mismatch indata/16_it_2.JSON
data/16_it_2.JSONis present;.metadata.entity_count=11matches the 11 Mermaid nodes, and.metadata.attachment_count=0matches (no attachment-like elements / 10 edges)..metadata.container_count=1conflicts with the actual JSON element type counts (container=6) and with the Mermaid diagram showing 6[Container]nodes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@data/16_it_2_ground_truth.MMD` around lines 1 - 32, The .metadata.container_count in data/16_it_2.JSON is incorrect (shows 1) and must reflect the six container nodes present in the Mermaid: ci_runner ("GitLab CI Runner"), terraform ("Terraform"), test_suite ("Test Suite"), web_app ("SomeApp"), postgres ("PostgreSQL"), and object_storage ("Object Storage"); update .metadata.container_count to 6 (leave .metadata.entity_count=11 and .metadata.attachment_count=0 unchanged) so the JSON metadata matches the diagram.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@data/12_bpmn_2.JSON`:
- Around line 12-21: Replace the inconsistent BPMN collaboration key by choosing
one canonical key name (either "pools" or "participants") and update all BPMN
JSON files to use that key consistently; locate JSON objects that currently
contain "pools" (e.g., the object with "id":"pool_team_assistant") and rename
that property to the chosen key, and also find objects using "participants" and
rename them to the chosen key so every collaboration structure uses the same
property name across the dataset.
In `@data/15_bpmn_2.JSON`:
- Around line 139-174: The subprocess container objects Activity_1ke2ixr and
Activity_0vp33kx are missing a pool/participant link; add a pool:
"Participant_00" property to both subprocess JSON objects so they are anchored
to Participant_00 (matching their children that use parent_subprocess).
In `@data/20_it_2_ground_truth.MMD`:
- Around line 9-25: The diagram currently defines top-level subgraphs cloud_env
and on_prem_dc; wrap both of these inside an outermost subgraph named
system_boundary so internal infrastructure is contained (e.g., create subgraph
system_boundary["System Boundary"] and move subgraph cloud_env and subgraph
on_prem_dc definitions inside it), and ensure any external actors (if present
elsewhere) remain outside that system_boundary; update identifiers cloud_env and
on_prem_dc only by relocating them into the new system_boundary block so their
internal node names (cdn_lb, vpn_gw_cloud, vpn_gw_onprem, app_server, db_server,
file_server, ad_server, cloud_storage, cloud_monitoring, firewall) are
unchanged.
In `@data/23_bpmn_3.JSON`:
- Around line 248-254: Remove the obsolete 24-hour expiry branch by deleting the
intermediate catch event node with id "icatch_24h" and any sequenceFlow,
gateway, task, or condition elements that implement the "Offer Expired" path;
update the JSON scenario (remove the object with "id":"icatch_24h" and any
references to that id in sequenceFlows or element arrays) and make the
corresponding deletions/edits in the Mermaid ground-truth so both
representations match the simplified scenario described in the PR objectives;
ensure no remaining references to "icatch_24h" or "Offer Expired" remain in
either file and validate the scenario's element/flow consistency after removal.
- Line 7: The metadata field entity_count is incorrect (currently 32) and should
match the actual number of entries in the nodes array (33); open the JSON,
locate the "entity_count" key and update its value to 33 so metadata and the
"nodes" array remain consistent, and run any dataset validation script to
confirm the counts align.
In `@data/24_bpmn_3_ground_truth.MMD`:
- Around line 63-75: The positive fraud branch currently bypasses the parallel
join; change the flow so the "Yes" output from xgw_fraud_detected goes into the
parallel join (pgw_join_fraud_risk) instead of directly to xgw_escalate, then
have pgw_join_fraud_risk continue to xgw_escalate (and keep xgw_escalate
-->|"Escalate"| user_senior_review) so both "Yes" and "No" paths synchronize at
pgw_join_fraud_risk; update the edges involving xgw_fraud_detected,
pgw_join_fraud_risk, and xgw_escalate accordingly.
In `@data/24_bpmn_3.JSON`:
- Around line 158-179: The three declared subprocess nodes
("sub_doc_validation", "sub_risk_assessment", "sub_compliance_check") are
orphaned—no flows enter/exit them and no nodes reference them via
parent_subprocess—so either wire the existing top-level nodes into these
containers by adding parent_subprocess fields on the relevant task/gateway/event
nodes (e.g., the document path nodes, accelerated path nodes, and fraud path
nodes) to point to the matching subprocess id, or remove the unused subprocess
objects entirely; update any sequenceFlows or branch nodes that should
enter/exit a subprocess so their source/target correctly reference nodes nested
under the corresponding subprocess id.
In `@data/25_bpmn_3.JSON`:
- Around line 2-10: The registry entry for benchmark "bpmn_3_25" has a stale
entity_count (currently set to 24); update the registration where "bpmn_3_25" is
defined (the code that currently sets entity_count=24) to set entity_count=27 to
match the file's metadata so reporting/validation are correct.
- Around line 182-195: The JSON nodes with ids xgw_approval_result,
xgw_manual_result, and boundary_fetch_err are missing human-readable "name"
values which must match the ground-truth Mermaid labels; update the objects for
id "xgw_approval_result" to set name "Vacation Approval", for id
"xgw_manual_result" to set name "Vacation Approved", and for id
"boundary_fetch_err" to set name "Error" so the benchmark input contains the
same labels the ground-truth expects (or alternatively remove those labels from
the ground-truth if you prefer the JSON to remain name-less).
In `@data/26_it_3.JSON`:
- Around line 178-190: The description for auth_zone2/auth_gw2 claims DC2
enforces the same geofencing and IAM policy as DC1, but the relationships only
include auth_gw2 -> iam_replica -> token_store_replica -> inner_fw2 and omit any
DC2-side geofencing or telecom MFA calls; either add the missing edges from
auth_gw2 (or iam_replica) to geofencing_svc and the telecom MFA component to
mirror DC1, or edit the auth_zone2/auth_gw2 description to remove the
geofencing/MFA assertion so it matches the modeled relationships (refer to
auth_zone2, auth_gw2, geofencing_svc, iam_replica, token_store_replica,
inner_fw2).
- Around line 2-9: The registry entry for benchmark id "it_3_26" in the
experiment registry currently sets entity_count=34 but the canonical dataset
(metadata.id "it_3_26") declares entity_count 28; update the registry record for
it_3_26 (the registry entry that sets entity_count for "it_3_26") to
entity_count=28 so the config (entity_count) matches the JSON metadata and keep
container_count/attachment_count unchanged unless other metadata differs.
In `@data/28_it_3.JSON`:
- Around line 166-171: The description for cloud_monitoring advertises telemetry
from notification_svc but the graph lacks a relationship; either add a
relationship object linking notification_svc -> cloud_monitoring (matching how
web_app and background_worker relate to cloud_monitoring) or change the
"description" text to remove notification service; update the resource with id
"cloud_monitoring" and/or the relationships for the component with id
"notification_svc" so the structured graph and the human-readable description
remain consistent.
---
Outside diff comments:
In `@data/16_it_2_ground_truth.MMD`:
- Around line 1-32: The .metadata.container_count in data/16_it_2.JSON is
incorrect (shows 1) and must reflect the six container nodes present in the
Mermaid: ci_runner ("GitLab CI Runner"), terraform ("Terraform"), test_suite
("Test Suite"), web_app ("SomeApp"), postgres ("PostgreSQL"), and object_storage
("Object Storage"); update .metadata.container_count to 6 (leave
.metadata.entity_count=11 and .metadata.attachment_count=0 unchanged) so the
JSON metadata matches the diagram.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: febc4dd6-4a6e-47b6-aed8-51a00ae4cb41
📒 Files selected for processing (71)
.claude/SESSION_RECAP_2026-05-06.md.claude/maestro_tier_rebalance_worksheet.mddata/01_bpmn_1.JSONdata/01_bpmn_1_ground_truth.MMDdata/02_bpmn_1.JSONdata/02_bpmn_1_ground_truth.MMDdata/03_bpmn_1.JSONdata/03_bpmn_1_ground_truth.MMDdata/04_bpmn_1.JSONdata/04_bpmn_1_ground_truth.MMDdata/05_bpmn_1.JSONdata/05_bpmn_1_ground_truth.MMDdata/06_it_1.JSONdata/06_it_1_ground_truth.MMDdata/07_it_1.JSONdata/07_it_1_ground_truth.MMDdata/08_it_1.JSONdata/08_it_1_ground_truth.MMDdata/09_it_1.JSONdata/09_it_1_ground_truth.MMDdata/10_it_1.JSONdata/10_it_1_ground_truth.MMDdata/11_bpmn_2.JSONdata/11_bpmn_2_ground_truth.MMDdata/12_bpmn_2.JSONdata/12_bpmn_2_ground_truth.MMDdata/13_bpmn_2.JSONdata/13_bpmn_2_ground_truth.MMDdata/14_bpmn_2.JSONdata/14_bpmn_2_ground_truth.MMDdata/15_bpmn_2.JSONdata/15_bpmn_2_ground_truth.MMDdata/16_it_2.JSONdata/16_it_2_ground_truth.MMDdata/17_it_2.JSONdata/17_it_2_ground_truth.MMDdata/18_it_2.JSONdata/18_it_2_ground_truth.MMDdata/19_it_2.JSONdata/19_it_2_ground_truth.MMDdata/20_it_2.JSONdata/20_it_2_ground_truth.MMDdata/21_bpmn_3.JSONdata/21_bpmn_3_ground_truth.MMDdata/22_bpmn_3.JSONdata/22_bpmn_3_ground_truth.MMDdata/23_bpmn_3.JSONdata/23_bpmn_3_ground_truth.MMDdata/24_bpmn_3.JSONdata/24_bpmn_3_ground_truth.MMDdata/25_bpmn_3.JSONdata/25_bpmn_3_ground_truth.MMDdata/26_it_3.JSONdata/26_it_3_ground_truth.MMDdata/27_it_3.JSONdata/27_it_3_ground_truth.MMDdata/28_it_3.JSONdata/28_it_3_ground_truth.MMDdata/29_it_3.JSONdata/29_it_3_ground_truth.MMDdata/30_it_3.JSONdata/30_it_3_ground_truth.MMDpyproject.tomlsrc/maestro/analysis/metrics.pysrc/maestro/db/client.pysrc/maestro/db/queries.pysrc/maestro/experiment_config.pysrc/maestro/schemas.pytests/analysis/test_dataset_consistency.pytests/analysis/test_extraction.pytests/analysis/test_metrics.py
- Rename top-level 'pools' -> 'participants' in tier-2 BPMN files (12-15) for a single canonical collaboration key across the corpus - Remove 3 disconnected subProcess nodes from bpmn_3_24 (no children, no flows); update ground truth and entity_count 26 -> 23 - Refresh bpmn_3_24 registry description and entity_count to match
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
data/24_bpmn_3.JSON (1)
262-326:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftCritical: Parallel gateway deadlock in fraud evaluation path.
The fraud evaluation flow (sf10–sf20) contains a structural error that causes a deadlock:
pgw_split_fraud_risk(parallel split) creates two concurrent tokens:
- Token A: →
user_check_fraud→xgw_fraud_detected- Token B: →
user_assess_risk→pgw_join_fraud_risk
xgw_fraud_detected(exclusive gateway) branches:
- "No" (sf15): Token A proceeds to
pgw_join_fraud_risk→ both tokens arrive, join completes ✓- "Yes" (sf17): Token A proceeds to
xgw_escalate→ Token A never reachespgw_join_fraud_riskWhen the "Yes" path is taken,
pgw_join_fraud_riskwaits indefinitely for the second token that was redirected toxgw_escalate, causing the process instance to hang.Root cause: An exclusive divergence (
xgw_fraud_detected) between a parallel split and join violates the parallel gateway contract, which requires both branches to complete before synchronization.Fix: Move the fraud decision logic (
xgw_fraud_detected→xgw_escalate) after the parallel join so both branches complete first:
pgw_split_fraud_risk→ [user_check_fraud,user_assess_risk] →pgw_join_fraud_risk→xgw_fraud_detected→ branching logic.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@data/24_bpmn_3.JSON` around lines 262 - 326, The parallel-split/join around the fraud path causes a deadlock because xgw_fraud_detected (exclusive) can divert Token A away from pgw_join_fraud_risk; move the exclusive decision so both parallel branches complete before branching: change the flow so pgw_split_fraud_risk still creates user_check_fraud and user_assess_risk, both flow into pgw_join_fraud_risk, and then pgw_join_fraud_risk connects to xgw_fraud_detected which then routes to xgw_escalate or end_fraud_not_detected/end_fraud_detected as appropriate (update sequence flows sf11–sf20 to reflect join before xgw_fraud_detected and remove the sf17 path that bypasses pgw_join_fraud_risk).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@data/24_bpmn_3.JSON`:
- Line 10: The description currently claims "Three event sub-processes..." but
the BPMN `nodes` array contains no nodes with "type": "subProcess"; update the
"description" string to accurately describe the actual structure—e.g., replace
the existing text with something like "Three parallel event-driven flows
(document request, accelerated decision, fraud evaluation) initiate from
separate start events"—and ensure no reference to "subProcess" remains unless
you convert the three flows to actual subProcess nodes in the `nodes` array;
locate the "description" key in the JSON and edit its value accordingly.
---
Outside diff comments:
In `@data/24_bpmn_3.JSON`:
- Around line 262-326: The parallel-split/join around the fraud path causes a
deadlock because xgw_fraud_detected (exclusive) can divert Token A away from
pgw_join_fraud_risk; move the exclusive decision so both parallel branches
complete before branching: change the flow so pgw_split_fraud_risk still creates
user_check_fraud and user_assess_risk, both flow into pgw_join_fraud_risk, and
then pgw_join_fraud_risk connects to xgw_fraud_detected which then routes to
xgw_escalate or end_fraud_not_detected/end_fraud_detected as appropriate (update
sequence flows sf11–sf20 to reflect join before xgw_fraud_detected and remove
the sf17 path that bypasses pgw_join_fraud_risk).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 142c27ed-e96b-472b-962b-7e168414fd5c
📒 Files selected for processing (9)
data/12_bpmn_2.JSONdata/13_bpmn_2.JSONdata/14_bpmn_2.JSONdata/15_bpmn_2.JSONdata/24_bpmn_3.JSONdata/24_bpmn_3_ground_truth.MMDsrc/maestro/analysis/metrics.pysrc/maestro/experiment_config.pytests/analysis/test_extraction.py
💤 Files with no reviewable changes (1)
- data/24_bpmn_3_ground_truth.MMD
✅ Files skipped from review due to trivial changes (2)
- data/12_bpmn_2.JSON
- data/13_bpmn_2.JSON
🚧 Files skipped from review as they are similar to previous changes (5)
- data/15_bpmn_2.JSON
- data/14_bpmn_2.JSON
- tests/analysis/test_extraction.py
- src/maestro/experiment_config.py
- src/maestro/analysis/metrics.py
- Anchor bpmn_2_15 subprocesses (Activity_1ke2ixr, Activity_0vp33kx) to Participant_00 to match the nesting the ground truth draws - Fix bpmn_3_24 parallel-join deadlock: both fraud/risk checks now join before the exclusive fraud decision (JSON + ground truth); refresh the stale metadata description left by the earlier subprocess removal - Correct it_3_26 and it_3_28 descriptions to match modeled relationships (DC2 geofencing claim, cloud_monitoring telemetry sources)
|
@coderabbitai please do a focused re-review of the dataset changes in the latest commit (bd570c0). No need for a full repo pass. The relevant files are:
Addressed:
Intentionally not changed (with rationale):
All Could you confirm the remaining open threads are resolved and approve for merge if the dataset changes look good and adjust your current timeout message in the original comment accordingly? Thanks! |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== Check pools vs participants key in tier-2 BPMN files ==="
for f in data/12_bpmn_2.JSON data/13_bpmn_2.JSON data/14_bpmn_2.JSON data/15_bpmn_2.JSON; do
echo "--- $f top-level keys ---"
python -c "import json; d=json.load(open('$f')); print(list(d.keys()))"
done
echo ""
echo "=== 15_bpmn_2.JSON - check Activity_1ke2ixr and Activity_0vp33kx pool field ==="
python -c "
import json
d = json.load(open('data/15_bpmn_2.JSON'))
key = list(d.keys())[0]
nodes = d[key].get('nodes', [])
for n in nodes:
if n.get('id') in ('Activity_1ke2ixr', 'Activity_0vp33kx'):
print(n)
"
echo ""
echo "=== 24_bpmn_3.JSON - check entity_count and nodes ==="
python -c "
import json
d = json.load(open('data/24_bpmn_3.JSON'))
key = list(d.keys())[0]
meta = d[key].get('metadata', {})
print('entity_count:', meta.get('entity_count'))
nodes = d[key].get('nodes', [])
print('num nodes:', len(nodes))
# Check for orphan subprocesses (no children, no flows)
ids = {n['id'] for n in nodes}
print('node types:', [(n['id'], n.get('type')) for n in nodes])
"
echo ""
echo "=== 24_bpmn_3.JSON - check parallel join structure ==="
python -c "
import json
d = json.load(open('data/24_bpmn_3.JSON'))
key = list(d.keys())[0]
nodes = d[key].get('nodes', [])
flows = d[key].get('sequence_flows', [])
# Find pgw_join_fraud_risk and xgw_fraud_detected
for n in nodes:
if 'fraud' in n['id'].lower() or 'join' in n['id'].lower():
print('node:', n['id'], n.get('type'), n.get('name',''))
print('---flows to/from fraud detection gateways---')
for f in flows:
if any(x in f.get('source','') + f.get('target','') for x in ['fraud', 'join_fraud', 'pgw_join']):
print(f)
"
echo ""
echo "=== 24_bpmn_3_ground_truth.MMD - show content ==="
cat data/24_bpmn_3_ground_truth.MMD
echo ""
echo "=== 26_it_3.JSON - check auth_gw2 and cloud_monitoring descriptions ==="
python -c "
import json
d = json.load(open('data/26_it_3.JSON'))
key = list(d.keys())[0]
elements = d[key].get('elements', [])
for e in elements:
if e.get('id') in ('auth_gw2', 'cloud_monitoring'):
print(e)
"
echo ""
echo "=== 28_it_3.JSON - check cloud_monitoring description ==="
python -c "
import json
d = json.load(open('data/28_it_3.JSON'))
key = list(d.keys())[0]
elements = d[key].get('elements', [])
for e in elements:
if 'monitoring' in e.get('id','').lower() or 'monitoring' in e.get('name','').lower():
print(e)
"
echo ""
echo "=== experiment_config.py - check INPUTS list ==="
python -c "
import ast, sys
with open('src/maestro/experiment_config.py') as f:
src = f.read()
# just show the INPUTS variable entries count and example_ids
import re
example_ids = re.findall(r'example_id\s*=\s*[\"\'](.*?)[\"\']\b', src)
print('Number of INPUTS entries:', len(example_ids))
print('example_ids:', example_ids)
"
echo ""
echo "=== Check file paths in experiment_config.py match actual files ==="
python -c "
import re, os
with open('src/maestro/experiment_config.py') as f:
src = f.read()
paths = re.findall(r'DATA_DIR\s*/\s*[\"\'](.*?)[\"\']\b', src)
print('Paths referenced:')
for p in paths:
exists = os.path.exists(os.path.join('data', p))
print(f' data/{p}: {\"OK\" if exists else \"MISSING\"}')"Length of output: 7112 🏁 Script executed: #!/bin/bash
echo "=== 15_bpmn_2.JSON structure ==="
python -c "
import json
d = json.load(open('data/15_bpmn_2.JSON'))
key = list(d.keys())[0]
meta = d.get('metadata', {})
print('Top-level keys:', list(d.keys()))
print('metadata id:', meta.get('id') if meta else 'N/A')
nodes = d.get('nodes', [])
print('Total nodes:', len(nodes))
for n in nodes:
if 'subprocess' in n.get('type','').lower() or n.get('id','') in ('Activity_1ke2ixr','Activity_0vp33kx'):
print(' SubP node:', n)
"
echo ""
echo "=== 24_bpmn_3.JSON structure ==="
python -c "
import json
d = json.load(open('data/24_bpmn_3.JSON'))
print('Top-level keys:', list(d.keys()))
# Try direct access
meta = d.get('metadata', {})
print('entity_count:', meta.get('entity_count'))
nodes = d.get('nodes', [])
print('num nodes:', len(nodes))
for n in nodes:
if 'fraud' in n.get('id','').lower() or 'join' in n.get('id','').lower() or n.get('type','') == 'subProcess':
print(' node:', n.get('id'), n.get('type'), n.get('name',''))
"
echo ""
echo "=== 24_bpmn_3.JSON parallel join flows ==="
python -c "
import json
d = json.load(open('data/24_bpmn_3.JSON'))
flows = d.get('sequence_flows', [])
print('Total flows:', len(flows))
for f in flows:
src = f.get('source','')
tgt = f.get('target','')
if any(x in src+tgt for x in ['fraud','join','pgw','risk','escalat']):
print(f' {src} --> {tgt} name={f.get(\"name\",\"\")}')
"
echo ""
echo "=== 26_it_3.JSON structure + element IDs ==="
python -c "
import json
d = json.load(open('data/26_it_3.JSON'))
print('Top-level keys:', list(d.keys()))
meta = d.get('metadata',{})
print('id:', meta.get('id'))
elements = d.get('elements', [])
print('Total elements:', len(elements))
for e in elements:
if 'auth' in e.get('id','').lower() or 'monitor' in e.get('id','').lower() or 'monitor' in e.get('name','').lower():
print(' el:', e.get('id'), '|', e.get('name',''), '|', e.get('description','')[:80])
"
echo ""
echo "=== 28_it_3.JSON structure + monitoring elements ==="
python -c "
import json
d = json.load(open('data/28_it_3.JSON'))
print('Top-level keys:', list(d.keys()))
elements = d.get('elements', [])
print('Total elements:', len(elements))
for e in elements:
if 'monitor' in e.get('id','').lower() or 'monitor' in e.get('name','').lower():
print(' el:', e.get('id'), '|', e.get('name',''), '|', e.get('description','')[:100])
"
echo ""
echo "=== experiment_config.py INPUTS and file paths ==="
python -c "
import sys
sys.path.insert(0, 'src')
try:
from maestro.experiment_config import INPUTS
print('Number of INPUTS:', len(INPUTS))
for inp in INPUTS:
import os
fp = str(inp.file_path)
gtp = str(inp.ground_truth_path)
fp_ok = os.path.exists(fp)
gtp_ok = os.path.exists(gtp)
status = 'OK' if (fp_ok and gtp_ok) else f'MISSING: {\"file\" if not fp_ok else \"\"}{\" gt\" if not gtp_ok else \"\"}'
print(f' {inp.example_id}: {status}')
except Exception as e:
print('Error:', e)
"Length of output: 6390
The rationale for the intentionally unchanged items (entity count semantics, registry annotations, Everything is in order — approving. 🐇 [approve] |
contract-aligned ground truth + container/attachment s…coring
Reworks the benchmark so the structured JSON input fully specifies the reference Mermaid output, fixes the scoring extractor, and adds container and attachment scoring dimensions. The dataset rework and the code changes it forces land together.
Dataset (data/*.JSON, *_ground_truth.MMD)
o--omarker, recorded in JSON (attached_to / compensation_associations); they are no longer mis-scored as message flows.Scorer (src/maestro/analysis/metrics.py)
<-->recognised as one undirected relationship;o--oexcluded from flows and surfaced via extract_attachments.Schema / DB
Input registry
Tests / lint
Summary by CodeRabbit
New Features
Tests
Documentation