Conversation
Add the data views over the experiment database, each following the charts_reference template (read-only connect → query → themed figure → render_chart) and the visualization design guide: - Overview: operational summary (metric cards, runs-by-strategy split by success, cost per strategy). - Strategy Comparison (RQ1/RQ2): grouped entity/relationship F1 bars per strategy, with tier/model filters and an F1/precision/recall toggle; controls excluded. - Pareto (RQ4): entity-ID F1 vs cost and vs latency scatter, colored by strategy and shaped by tier, with a per-run detail table (no hover on static figures). - Run Detail: per-run input/ground-truth (read from the file system via experiment_config.INPUTS), generated diagram + parse badge, metric breakdown bar, and the multi-step sub-call trace. - Hallucination Taxonomy (RQ3): stacked entity/relationship error-count bars per strategy, with a view-local four-category error palette. - Diagram Visualizer (intentional addition beyond the five-view spec): ground-truth vs generated diagram side by side, with a Code/Visualization toggle. Visualization renders via the mmdc CLI — the same engine the metric pipeline uses for parses_valid — so the picture matches the recorded validity and is deterministic/reproducible. Falls back to source when mmdc is unavailable or a source fails to render. All view queries live in viz/queries.py (control-aware, identifier-validated, graceful on a schemaless DB). Views degrade to empty-states on sparse data. Rename the Home view render fn and register all views in the VIEWS registry. tests/viz/test_views.py covers the query layer against synthetic multi-strategy/tier/control data, plus the Mermaid-render fallbacks.
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughImplements a Streamlit visualization dashboard: adds Mermaid-to-SVG rendering, a robust read-only SQL query layer for dashboard metrics and taxonomy, a concrete view registry with seven Streamlit views, and comprehensive tests validating queries, views, and Mermaid-fallback behavior. ChangesStreamlit Viz Dashboard
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
src/maestro/viz/queries.py (1)
369-379: 💤 Low valueConsider explicit column selection over
SELECT *with joins.
SELECT c.*, r.*, m.*across JOINed tables risks column shadowing if any tables share column names beyond the join key (run_id). The last value wins in the resulting dict. While this works today, explicit column enumeration would be more robust if schemas evolve.Fine to keep as-is for now given this is a read-only dashboard query.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/maestro/viz/queries.py` around lines 369 - 379, The query uses SELECT c.*, r.*, m.* which can cause silent column shadowing when joined tables share column names; update this SQL in the block that executes conn.execute(...) with run_id so it explicitly lists each needed column and/or uses aliases (e.g., c.some_col AS c_some_col, r.some_col AS r_some_col, m.some_col AS m_some_col) for all columns returned from run_configs, run_results, and metric_results to avoid collisions before creating dict(row). Ensure the returned dict keys match the new aliases or adjust downstream code that consumes dict(row).tests/viz/test_views.py (2)
111-129: 💤 Low valueOptional: extract a
populated_connpytest fixture.The
conn = _conn(); _populate(conn)pair is repeated across ~10 tests. A fixture would cut the boilerplate and let you close connections deterministically (the in-memory connections are currently never closed).♻️ Example fixture
`@pytest.fixture` def populated_conn(): conn = _conn() _populate(conn) yield conn conn.close()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/viz/test_views.py` around lines 111 - 129, Extract a pytest fixture to avoid repeating conn setup: create a fixture named populated_conn that calls _conn(), invokes _populate(conn) and yields the connection, then closes it after the test; update tests that currently call conn = _conn(); _populate(conn) to accept the populated_conn fixture instead and remove manual close logic (reference symbols: _conn, _populate, tests in tests/viz/test_views.py).
321-341: 💤 Low valueOptional: section header doesn't match the tests below it.
The
Graceful degradation … schemaless DBbanner at Lines 321-323 sits directly above the two Mermaid render tests, which belong under their own "Mermaid rendering" header. The actual schemaless test starts at Line 343.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/viz/test_views.py` around lines 321 - 341, The section header "Graceful degradation — every query no-ops on an empty (schemaless) DB" is misplaced above the Mermaid tests (test_mermaid_render_blank_source_returns_none, test_mermaid_render_handles_missing_mmdc) — move that banner (or replace it) so the Mermaid tests are grouped under a new "Mermaid rendering" header and the schemaless DB header sits immediately above the actual schemaless test (starting at the later test), ensuring test organization matches headers and keeping function names test_mermaid_render_blank_source_returns_none and test_mermaid_render_handles_missing_mmdc together under the new header.src/maestro/viz/views/pareto.py (1)
90-105: ⚡ Quick winNumeric Pareto fields are non-null (so rounding/scatter won’t TypeError under current schema)
pareto_pointspullsrun_results.cost_usd,run_results.duration_ms, andmetric_results.entity_id_f1via INNER JOINs, and those columns are declaredNOT NULLinsrc/maestro/db/client.py. The write paths type them as non-optional (RunResult.cost_usd/duration_ms,MetricResult.entity_id_f1), and the metrics pipeline returnsentity_id_f1=0.0even when ground truth is missing—soround(p["cost_usd"], ...),p["duration_ms"],round(p["entity_id_f1"], ...), and theax.scatter(...)calls insrc/maestro/viz/views/pareto.pyshould not seeNone.Optional: if you expect legacy DBs with NULLs in those columns, filter out rows with NULLs before rounding/scattering.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/maestro/viz/views/pareto.py` around lines 90 - 105, The numeric fields (cost_usd, duration_ms, entity_id_f1) are non-null under current schema, but to be defensive against legacy DBs you should filter out any points with null/None values before rounding or plotting: update the code that builds/uses the points list (the comprehension that feeds st.dataframe and the plotting code that calls ax.scatter) to first filter points where p["cost_usd"] is not None and p["duration_ms"] is not None and p["entity_id_f1"] is not None, then perform round(...) and plotting only on that filtered list; reference the existing symbols points, st.dataframe, strategy_display_name, and ax.scatter when making the change.src/maestro/viz/views/hallucination.py (1)
128-138: 💤 Low valueWhite value labels have low contrast on the lighter segment colors.
White text on
#95A5A6(gray, ~2:1) and#3498DB(blue, ~3:1) fails WCAG AA for the smallfontsize=8labels. Consider choosing label color per-segment based on luminance for legibility.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/maestro/viz/views/hallucination.py` around lines 128 - 138, The white labels are low-contrast on some segment colors; in the loop that draws labels (the for xi, (h, b) in enumerate(zip(heights, bottom)) block where ax.text(...) is called), compute the segment fill color and its relative luminance (e.g., from ax.patches[xi].get_facecolor() or the colors list used to draw the bars), then choose label color based on that luminance (dark text like "black" for light fills, white for dark fills) before calling ax.text; implement a simple luminance check (Y = 0.2126*R + 0.7152*G + 0.0722*B) and a threshold (~0.5 or tuned) to pick the contrasting label color so small fontsize=8 labels meet legibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/maestro/viz/views/diagram_visualizer.py`:
- Around line 144-149: The helper _read_or_note currently only catches OSError
around Path(path).read_text(encoding="utf-8"), but read_text can raise
UnicodeDecodeError for non-UTF-8 files; update the except clause to catch both
OSError and UnicodeDecodeError (e.g., except (OSError, UnicodeDecodeError) as e)
when calling Path.read_text so a missing or non-UTF-8 file returns the "(file
not found: ...)" note instead of crashing; keep the return string behavior and
reference the Path.read_text call and _read_or_note function.
In `@src/maestro/viz/views/run_detail.py`:
- Around line 84-95: _fmt_ts currently calls datetime.fromisoformat(ts) without
guarding against None or empty values, causing a TypeError; update _fmt_ts to
first check that ts is a truthy string (e.g., if not ts: return ts) before
attempting to parse, then proceed with the existing try/except ValueError and
call format_for_display on the parsed datetime; reference symbols: _fmt_ts,
datetime.fromisoformat, format_for_display, viz_settings.current_settings.
---
Nitpick comments:
In `@src/maestro/viz/queries.py`:
- Around line 369-379: The query uses SELECT c.*, r.*, m.* which can cause
silent column shadowing when joined tables share column names; update this SQL
in the block that executes conn.execute(...) with run_id so it explicitly lists
each needed column and/or uses aliases (e.g., c.some_col AS c_some_col,
r.some_col AS r_some_col, m.some_col AS m_some_col) for all columns returned
from run_configs, run_results, and metric_results to avoid collisions before
creating dict(row). Ensure the returned dict keys match the new aliases or
adjust downstream code that consumes dict(row).
In `@src/maestro/viz/views/hallucination.py`:
- Around line 128-138: The white labels are low-contrast on some segment colors;
in the loop that draws labels (the for xi, (h, b) in enumerate(zip(heights,
bottom)) block where ax.text(...) is called), compute the segment fill color and
its relative luminance (e.g., from ax.patches[xi].get_facecolor() or the colors
list used to draw the bars), then choose label color based on that luminance
(dark text like "black" for light fills, white for dark fills) before calling
ax.text; implement a simple luminance check (Y = 0.2126*R + 0.7152*G + 0.0722*B)
and a threshold (~0.5 or tuned) to pick the contrasting label color so small
fontsize=8 labels meet legibility.
In `@src/maestro/viz/views/pareto.py`:
- Around line 90-105: The numeric fields (cost_usd, duration_ms, entity_id_f1)
are non-null under current schema, but to be defensive against legacy DBs you
should filter out any points with null/None values before rounding or plotting:
update the code that builds/uses the points list (the comprehension that feeds
st.dataframe and the plotting code that calls ax.scatter) to first filter points
where p["cost_usd"] is not None and p["duration_ms"] is not None and
p["entity_id_f1"] is not None, then perform round(...) and plotting only on that
filtered list; reference the existing symbols points, st.dataframe,
strategy_display_name, and ax.scatter when making the change.
In `@tests/viz/test_views.py`:
- Around line 111-129: Extract a pytest fixture to avoid repeating conn setup:
create a fixture named populated_conn that calls _conn(), invokes
_populate(conn) and yields the connection, then closes it after the test; update
tests that currently call conn = _conn(); _populate(conn) to accept the
populated_conn fixture instead and remove manual close logic (reference symbols:
_conn, _populate, tests in tests/viz/test_views.py).
- Around line 321-341: The section header "Graceful degradation — every query
no-ops on an empty (schemaless) DB" is misplaced above the Mermaid tests
(test_mermaid_render_blank_source_returns_none,
test_mermaid_render_handles_missing_mmdc) — move that banner (or replace it) so
the Mermaid tests are grouped under a new "Mermaid rendering" header and the
schemaless DB header sits immediately above the actual schemaless test (starting
at the later test), ensuring test organization matches headers and keeping
function names test_mermaid_render_blank_source_returns_none and
test_mermaid_render_handles_missing_mmdc together under the new header.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4de40a71-44b0-48a2-9ac6-f81c0bbc6000
📒 Files selected for processing (10)
src/maestro/viz/mermaid_render.pysrc/maestro/viz/queries.pysrc/maestro/viz/views/__init__.pysrc/maestro/viz/views/diagram_visualizer.pysrc/maestro/viz/views/hallucination.pysrc/maestro/viz/views/overview.pysrc/maestro/viz/views/pareto.pysrc/maestro/viz/views/run_detail.pysrc/maestro/viz/views/strategy_comparison.pytests/viz/test_views.py
- _read_or_note (run_detail, diagram_visualizer): also catch UnicodeDecodeError so a non-UTF-8 input/ground-truth file returns a note instead of crashing the view. - _fmt_ts (run_detail): guard against None/empty timestamps and catch TypeError, not just ValueError, so a partial/legacy row can't crash the run picker. - run_detail query: replace SELECT c.*, r.*, m.* with an explicit aliased column list — all three tables carry run_id, and the star-join silently shadowed it; the explicit list also documents exactly what the views read. - hallucination: pick value-label color by segment-fill luminance (white on dark, #333333 on light), per the design guide's cell-annotation rule, so labels stay legible on the lighter error colors. - tests: split the misplaced section header so the Mermaid-render tests and the schemaless-DB test sit under accurate banners. Skipped: Pareto None-filtering (inner joins + NOT NULL schema already guarantee non-null; would be dead code) and the populated_conn fixture (cosmetic; in-memory connections need no teardown).
|
@coderabbitai addressed your raised comments in a8248cb. please check again :) and also update your initial comment to reflect the findings (currently the time out screen). |
🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Get PR review comments from coderabbitai
gh pr view 56 --repo Colinho22/maestro --json reviews,comments
# Get the commit details
gh api repos/Colinho22/maestro/commits/a8248cbfeead600a9aefb83300062d7aec792edb --jq '.commit.message'
# Get the commit diff/files changed
gh api repos/Colinho22/maestro/commits/a8248cbfeead600a9aefb83300062d7aec792edb --jq '.files[] | {filename: .filename, patch: .patch}'Length of output: 56856 🏁 Script executed: #!/bin/bash
# List all review comments by coderabbitai on this PR
gh api repos/Colinho22/maestro/pulls/56/comments --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, path: .path, line: .line, body: .body, node_id: .node_id}'Length of output: 478
✅ Actionable comments (both resolved):
✅ Nitpicks (addressed):
⏭️ Intentionally skipped (reasonable):
Everything looks good! I'll regenerate the PR summary now to replace the timeout screen. 🐰✨ ✅ Action performedSummary regeneration triggered. |
Add the data views over the experiment database, each following the charts_reference template (read-only connect → query → themed figure → render_chart) and the visualization design guide:
All view queries live in viz/queries.py (control-aware, identifier-validated, graceful on a schemaless DB). Views degrade to empty-states on sparse data. Rename the Home view render fn and register all views in the VIEWS registry.
tests/viz/test_views.py covers the query layer against synthetic multi-strategy/tier/control data, plus the Mermaid-render fallbacks.
Summary by CodeRabbit
New Features
New Queries
Tests