Skip to content

feat(viz): implement the dashboard data views#56

Merged
Colinho22 merged 2 commits into
mainfrom
viz-views
Jun 3, 2026
Merged

feat(viz): implement the dashboard data views#56
Colinho22 merged 2 commits into
mainfrom
viz-views

Conversation

@Colinho22

@Colinho22 Colinho22 commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Add the data views over the experiment database, each following the charts_reference template (read-only connect → query → themed figure → render_chart) and the visualization design guide:

  • Overview: operational summary (metric cards, runs-by-strategy split by success, cost per strategy).
  • Strategy Comparison (RQ1/RQ2): grouped entity/relationship F1 bars per strategy, with tier/model filters and an F1/precision/recall toggle; controls excluded.
  • Pareto (RQ4): entity-ID F1 vs cost and vs latency scatter, colored by strategy and shaped by tier, with a per-run detail table (no hover on static figures).
  • Run Detail: per-run input/ground-truth (read from the file system via experiment_config.INPUTS), generated diagram + parse badge, metric breakdown bar, and the multi-step sub-call trace.
  • Hallucination Taxonomy (RQ3): stacked entity/relationship error-count bars per strategy, with a view-local four-category error palette.
  • Diagram Visualizer (intentional addition beyond the five-view spec): ground-truth vs generated diagram side by side, with a Code/Visualization toggle. Visualization renders via the mmdc CLI — the same engine the metric pipeline uses for parses_valid — so the picture matches the recorded validity and is deterministic/reproducible. Falls back to source when mmdc is unavailable or a source fails to render.

All view queries live in viz/queries.py (control-aware, identifier-validated, graceful on a schemaless DB). Views degrade to empty-states on sparse data. Rename the Home view render fn and register all views in the VIEWS registry.

tests/viz/test_views.py covers the query layer against synthetic multi-strategy/tier/control data, plus the Mermaid-render fallbacks.

Summary by CodeRabbit

  • New Features

    • Diagram Visualizer with side-by-side code/visual toggle and graceful rendering fallback.
    • New dashboard pages: Overview, Strategy Comparison, Pareto, Run Detail, Hallucination Taxonomy.
    • Interactive filtering by strategy, tier, and model across views.
    • Mermaid rendering helper that converts diagram source to SVG when available.
  • New Queries

    • Read-only dashboard queries: summaries, per-strategy metrics/costs, Pareto points, run lists/details, and taxonomy counts (safe empty-DB behavior).
  • Tests

    • End-to-end tests covering queries, views, taxonomy logic, rendering fallbacks, and empty-schema safety.

Add the data views over the experiment database, each following the
charts_reference template (read-only connect → query → themed figure →
render_chart) and the visualization design guide:

- Overview: operational summary (metric cards, runs-by-strategy split by
  success, cost per strategy).
- Strategy Comparison (RQ1/RQ2): grouped entity/relationship F1 bars per
  strategy, with tier/model filters and an F1/precision/recall toggle;
  controls excluded.
- Pareto (RQ4): entity-ID F1 vs cost and vs latency scatter, colored by
  strategy and shaped by tier, with a per-run detail table (no hover on
  static figures).
- Run Detail: per-run input/ground-truth (read from the file system via
  experiment_config.INPUTS), generated diagram + parse badge, metric
  breakdown bar, and the multi-step sub-call trace.
- Hallucination Taxonomy (RQ3): stacked entity/relationship error-count
  bars per strategy, with a view-local four-category error palette.
- Diagram Visualizer (intentional addition beyond the five-view spec):
  ground-truth vs generated diagram side by side, with a Code/Visualization
  toggle. Visualization renders via the mmdc CLI — the same engine the
  metric pipeline uses for parses_valid — so the picture matches the
  recorded validity and is deterministic/reproducible. Falls back to source
  when mmdc is unavailable or a source fails to render.

All view queries live in viz/queries.py (control-aware, identifier-validated,
graceful on a schemaless DB). Views degrade to empty-states on sparse data.
Rename the Home view render fn and register all views in the VIEWS registry.

tests/viz/test_views.py covers the query layer against synthetic
multi-strategy/tier/control data, plus the Mermaid-render fallbacks.
@Colinho22 Colinho22 added this to the 📊 Analysis milestone Jun 3, 2026
@Colinho22 Colinho22 self-assigned this Jun 3, 2026
@Colinho22 Colinho22 added the enhancement New feature or request label Jun 3, 2026
@Colinho22 Colinho22 linked an issue Jun 3, 2026 that may be closed by this pull request
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Review Change Stack

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e100fb4b-38de-4a81-a068-5a2e827418d3

📥 Commits

Reviewing files that changed from the base of the PR and between 8965696 and a8248cb.

📒 Files selected for processing (10)
  • src/maestro/viz/mermaid_render.py
  • src/maestro/viz/queries.py
  • src/maestro/viz/views/__init__.py
  • src/maestro/viz/views/diagram_visualizer.py
  • src/maestro/viz/views/hallucination.py
  • src/maestro/viz/views/overview.py
  • src/maestro/viz/views/pareto.py
  • src/maestro/viz/views/run_detail.py
  • src/maestro/viz/views/strategy_comparison.py
  • tests/viz/test_views.py

📝 Walkthrough

Walkthrough

Implements a Streamlit visualization dashboard: adds Mermaid-to-SVG rendering, a robust read-only SQL query layer for dashboard metrics and taxonomy, a concrete view registry with seven Streamlit views, and comprehensive tests validating queries, views, and Mermaid-fallback behavior.

Changes

Streamlit Viz Dashboard

Layer / File(s) Summary
Mermaid SVG rendering
src/maestro/viz/mermaid_render.py
mmdc_available() and render_mermaid_svg() provide CLI-based Mermaid-to-SVG conversion with temp-file I/O, timeout handling, and graceful None-on-failure fallback.
Dashboard query layer
src/maestro/viz/queries.py
Adds _SUCCESS_SQL and 14+ read-only query functions plus taxonomy constants: overview summary, per-strategy success/cost/metric means, distinct filters (tiers/models/strategies), Pareto points, run detail/sub-results, and taxonomy aggregations. All queries return safe empty defaults when tables are missing.
View registry and home page
src/maestro/viz/views/__init__.py
Replaces placeholder wiring with concrete imports and _render_home(); VIEWS explicitly maps nav labels to each view module's render callable.
Diagram Visualizer
src/maestro/viz/views/diagram_visualizer.py
Side-by-side ground-truth vs generated Mermaid source with Code/Visualization toggle; uses Mermaid renderer when available and safe file-read helpers.
Hallucination Taxonomy
src/maestro/viz/views/hallucination.py
Tier-filtered, per-strategy stacked-bar charts for entity and relationship taxonomy error counts with color/contrast helpers and centered numeric labels.
Overview & Strategy Comparison
src/maestro/viz/views/overview.py, src/maestro/viz/views/strategy_comparison.py
Overview shows headline metrics and stacked runs/cost charts; Strategy Comparison computes per-strategy mean metrics and renders grouped-bar charts with tier/model/measure filters.
Pareto & Run Detail
src/maestro/viz/views/pareto.py, src/maestro/viz/views/run_detail.py
Pareto plots entity-ID F1 vs cost/latency and shows per-run tables. Run Detail displays IO, generated diagram, metric breakdown, and expandable sub-call traces.
Test suite
tests/viz/test_views.py
In-memory SQLite tests covering query outputs, view registry presence, mermaid rendering fallback, taxonomy aggregation, run-detail/sub-results, and schemaless-regression safety.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • Colinho22/maestro#54: Extends the Streamlit viz scaffold and adds query helpers and concrete view wiring referenced in this PR.
  • Colinho22/maestro#55: Related changes to Home page wiring and chart utilities used by the new Home view.

Poem

A rabbit peers at charts galore,
Diagrams, metrics, code and more,
I nibble bugs and stitch the view,
SVGs and tests — all green and true 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(viz): implement the dashboard data views' clearly and concisely summarizes the primary change: implementing multiple data visualization views for a dashboard system.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Docstring Coverage (Src Only) ✅ Passed All 26 public definitions (functions and modules) in changed src/ files have docstrings; coverage is 100%, exceeding the 80% threshold.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch viz-views

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
src/maestro/viz/queries.py (1)

369-379: 💤 Low value

Consider explicit column selection over SELECT * with joins.

SELECT c.*, r.*, m.* across JOINed tables risks column shadowing if any tables share column names beyond the join key (run_id). The last value wins in the resulting dict. While this works today, explicit column enumeration would be more robust if schemas evolve.

Fine to keep as-is for now given this is a read-only dashboard query.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/maestro/viz/queries.py` around lines 369 - 379, The query uses SELECT
c.*, r.*, m.* which can cause silent column shadowing when joined tables share
column names; update this SQL in the block that executes conn.execute(...) with
run_id so it explicitly lists each needed column and/or uses aliases (e.g.,
c.some_col AS c_some_col, r.some_col AS r_some_col, m.some_col AS m_some_col)
for all columns returned from run_configs, run_results, and metric_results to
avoid collisions before creating dict(row). Ensure the returned dict keys match
the new aliases or adjust downstream code that consumes dict(row).
tests/viz/test_views.py (2)

111-129: 💤 Low value

Optional: extract a populated_conn pytest fixture.

The conn = _conn(); _populate(conn) pair is repeated across ~10 tests. A fixture would cut the boilerplate and let you close connections deterministically (the in-memory connections are currently never closed).

♻️ Example fixture
`@pytest.fixture`
def populated_conn():
    conn = _conn()
    _populate(conn)
    yield conn
    conn.close()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/viz/test_views.py` around lines 111 - 129, Extract a pytest fixture to
avoid repeating conn setup: create a fixture named populated_conn that calls
_conn(), invokes _populate(conn) and yields the connection, then closes it after
the test; update tests that currently call conn = _conn(); _populate(conn) to
accept the populated_conn fixture instead and remove manual close logic
(reference symbols: _conn, _populate, tests in tests/viz/test_views.py).

321-341: 💤 Low value

Optional: section header doesn't match the tests below it.

The Graceful degradation … schemaless DB banner at Lines 321-323 sits directly above the two Mermaid render tests, which belong under their own "Mermaid rendering" header. The actual schemaless test starts at Line 343.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/viz/test_views.py` around lines 321 - 341, The section header "Graceful
degradation — every query no-ops on an empty (schemaless) DB" is misplaced above
the Mermaid tests (test_mermaid_render_blank_source_returns_none,
test_mermaid_render_handles_missing_mmdc) — move that banner (or replace it) so
the Mermaid tests are grouped under a new "Mermaid rendering" header and the
schemaless DB header sits immediately above the actual schemaless test (starting
at the later test), ensuring test organization matches headers and keeping
function names test_mermaid_render_blank_source_returns_none and
test_mermaid_render_handles_missing_mmdc together under the new header.
src/maestro/viz/views/pareto.py (1)

90-105: ⚡ Quick win

Numeric Pareto fields are non-null (so rounding/scatter won’t TypeError under current schema)

pareto_points pulls run_results.cost_usd, run_results.duration_ms, and metric_results.entity_id_f1 via INNER JOINs, and those columns are declared NOT NULL in src/maestro/db/client.py. The write paths type them as non-optional (RunResult.cost_usd/duration_ms, MetricResult.entity_id_f1), and the metrics pipeline returns entity_id_f1=0.0 even when ground truth is missing—so round(p["cost_usd"], ...), p["duration_ms"], round(p["entity_id_f1"], ...), and the ax.scatter(...) calls in src/maestro/viz/views/pareto.py should not see None.

Optional: if you expect legacy DBs with NULLs in those columns, filter out rows with NULLs before rounding/scattering.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/maestro/viz/views/pareto.py` around lines 90 - 105, The numeric fields
(cost_usd, duration_ms, entity_id_f1) are non-null under current schema, but to
be defensive against legacy DBs you should filter out any points with null/None
values before rounding or plotting: update the code that builds/uses the points
list (the comprehension that feeds st.dataframe and the plotting code that calls
ax.scatter) to first filter points where p["cost_usd"] is not None and
p["duration_ms"] is not None and p["entity_id_f1"] is not None, then perform
round(...) and plotting only on that filtered list; reference the existing
symbols points, st.dataframe, strategy_display_name, and ax.scatter when making
the change.
src/maestro/viz/views/hallucination.py (1)

128-138: 💤 Low value

White value labels have low contrast on the lighter segment colors.

White text on #95A5A6 (gray, ~2:1) and #3498DB (blue, ~3:1) fails WCAG AA for the small fontsize=8 labels. Consider choosing label color per-segment based on luminance for legibility.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/maestro/viz/views/hallucination.py` around lines 128 - 138, The white
labels are low-contrast on some segment colors; in the loop that draws labels
(the for xi, (h, b) in enumerate(zip(heights, bottom)) block where ax.text(...)
is called), compute the segment fill color and its relative luminance (e.g.,
from ax.patches[xi].get_facecolor() or the colors list used to draw the bars),
then choose label color based on that luminance (dark text like "black" for
light fills, white for dark fills) before calling ax.text; implement a simple
luminance check (Y = 0.2126*R + 0.7152*G + 0.0722*B) and a threshold (~0.5 or
tuned) to pick the contrasting label color so small fontsize=8 labels meet
legibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/maestro/viz/views/diagram_visualizer.py`:
- Around line 144-149: The helper _read_or_note currently only catches OSError
around Path(path).read_text(encoding="utf-8"), but read_text can raise
UnicodeDecodeError for non-UTF-8 files; update the except clause to catch both
OSError and UnicodeDecodeError (e.g., except (OSError, UnicodeDecodeError) as e)
when calling Path.read_text so a missing or non-UTF-8 file returns the "(file
not found: ...)" note instead of crashing; keep the return string behavior and
reference the Path.read_text call and _read_or_note function.

In `@src/maestro/viz/views/run_detail.py`:
- Around line 84-95: _fmt_ts currently calls datetime.fromisoformat(ts) without
guarding against None or empty values, causing a TypeError; update _fmt_ts to
first check that ts is a truthy string (e.g., if not ts: return ts) before
attempting to parse, then proceed with the existing try/except ValueError and
call format_for_display on the parsed datetime; reference symbols: _fmt_ts,
datetime.fromisoformat, format_for_display, viz_settings.current_settings.

---

Nitpick comments:
In `@src/maestro/viz/queries.py`:
- Around line 369-379: The query uses SELECT c.*, r.*, m.* which can cause
silent column shadowing when joined tables share column names; update this SQL
in the block that executes conn.execute(...) with run_id so it explicitly lists
each needed column and/or uses aliases (e.g., c.some_col AS c_some_col,
r.some_col AS r_some_col, m.some_col AS m_some_col) for all columns returned
from run_configs, run_results, and metric_results to avoid collisions before
creating dict(row). Ensure the returned dict keys match the new aliases or
adjust downstream code that consumes dict(row).

In `@src/maestro/viz/views/hallucination.py`:
- Around line 128-138: The white labels are low-contrast on some segment colors;
in the loop that draws labels (the for xi, (h, b) in enumerate(zip(heights,
bottom)) block where ax.text(...) is called), compute the segment fill color and
its relative luminance (e.g., from ax.patches[xi].get_facecolor() or the colors
list used to draw the bars), then choose label color based on that luminance
(dark text like "black" for light fills, white for dark fills) before calling
ax.text; implement a simple luminance check (Y = 0.2126*R + 0.7152*G + 0.0722*B)
and a threshold (~0.5 or tuned) to pick the contrasting label color so small
fontsize=8 labels meet legibility.

In `@src/maestro/viz/views/pareto.py`:
- Around line 90-105: The numeric fields (cost_usd, duration_ms, entity_id_f1)
are non-null under current schema, but to be defensive against legacy DBs you
should filter out any points with null/None values before rounding or plotting:
update the code that builds/uses the points list (the comprehension that feeds
st.dataframe and the plotting code that calls ax.scatter) to first filter points
where p["cost_usd"] is not None and p["duration_ms"] is not None and
p["entity_id_f1"] is not None, then perform round(...) and plotting only on that
filtered list; reference the existing symbols points, st.dataframe,
strategy_display_name, and ax.scatter when making the change.

In `@tests/viz/test_views.py`:
- Around line 111-129: Extract a pytest fixture to avoid repeating conn setup:
create a fixture named populated_conn that calls _conn(), invokes
_populate(conn) and yields the connection, then closes it after the test; update
tests that currently call conn = _conn(); _populate(conn) to accept the
populated_conn fixture instead and remove manual close logic (reference symbols:
_conn, _populate, tests in tests/viz/test_views.py).
- Around line 321-341: The section header "Graceful degradation — every query
no-ops on an empty (schemaless) DB" is misplaced above the Mermaid tests
(test_mermaid_render_blank_source_returns_none,
test_mermaid_render_handles_missing_mmdc) — move that banner (or replace it) so
the Mermaid tests are grouped under a new "Mermaid rendering" header and the
schemaless DB header sits immediately above the actual schemaless test (starting
at the later test), ensuring test organization matches headers and keeping
function names test_mermaid_render_blank_source_returns_none and
test_mermaid_render_handles_missing_mmdc together under the new header.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4de40a71-44b0-48a2-9ac6-f81c0bbc6000

📥 Commits

Reviewing files that changed from the base of the PR and between 8965696 and c6768ae.

📒 Files selected for processing (10)
  • src/maestro/viz/mermaid_render.py
  • src/maestro/viz/queries.py
  • src/maestro/viz/views/__init__.py
  • src/maestro/viz/views/diagram_visualizer.py
  • src/maestro/viz/views/hallucination.py
  • src/maestro/viz/views/overview.py
  • src/maestro/viz/views/pareto.py
  • src/maestro/viz/views/run_detail.py
  • src/maestro/viz/views/strategy_comparison.py
  • tests/viz/test_views.py

Comment thread src/maestro/viz/views/diagram_visualizer.py Outdated
Comment thread src/maestro/viz/views/run_detail.py
- _read_or_note (run_detail, diagram_visualizer): also catch
  UnicodeDecodeError so a non-UTF-8 input/ground-truth file returns a note
  instead of crashing the view.
- _fmt_ts (run_detail): guard against None/empty timestamps and catch
  TypeError, not just ValueError, so a partial/legacy row can't crash the
  run picker.
- run_detail query: replace SELECT c.*, r.*, m.* with an explicit aliased
  column list — all three tables carry run_id, and the star-join silently
  shadowed it; the explicit list also documents exactly what the views read.
- hallucination: pick value-label color by segment-fill luminance (white on
  dark, #333333 on light), per the design guide's cell-annotation rule, so
  labels stay legible on the lighter error colors.
- tests: split the misplaced section header so the Mermaid-render tests and
  the schemaless-DB test sit under accurate banners.

Skipped: Pareto None-filtering (inner joins + NOT NULL schema already
guarantee non-null; would be dead code) and the populated_conn fixture
(cosmetic; in-memory connections need no teardown).
@Colinho22

Copy link
Copy Markdown
Owner Author

@coderabbitai addressed your raised comments in a8248cb. please check again :) and also update your initial comment to reflect the findings (currently the time out screen).

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get PR review comments from coderabbitai
gh pr view 56 --repo Colinho22/maestro --json reviews,comments

# Get the commit details
gh api repos/Colinho22/maestro/commits/a8248cbfeead600a9aefb83300062d7aec792edb --jq '.commit.message'

# Get the commit diff/files changed
gh api repos/Colinho22/maestro/commits/a8248cbfeead600a9aefb83300062d7aec792edb --jq '.files[] | {filename: .filename, patch: .patch}'

Length of output: 56856


🏁 Script executed:

#!/bin/bash
# List all review comments by coderabbitai on this PR
gh api repos/Colinho22/maestro/pulls/56/comments --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, path: .path, line: .line, body: .body, node_id: .node_id}'

Length of output: 478



@Colinho22 Thanks for the follow-up! I've reviewed commit a8248cb — here's a summary of what was addressed:

✅ Actionable comments (both resolved):

  • _read_or_note in both diagram_visualizer.py and run_detail.py: now catches (OSError, UnicodeDecodeError) so non-UTF-8 files return a note instead of crashing.
  • _fmt_ts in run_detail.py: guarded against None/empty timestamps and broadened the except to (ValueError, TypeError).

✅ Nitpicks (addressed):

  • run_detail query in queries.py: SELECT c.*, r.*, m.* replaced with an explicit aliased column list — eliminates the silent run_id shadowing.
  • hallucination.py: Added _label_color() luminance helper; value labels now use #333333 on light fills and white on dark fills for WCAG legibility.
  • tests/viz/test_views.py: Section headers corrected — Mermaid render tests now sit under their own "Mermaid rendering" banner, and the schemaless-DB header is directly above the relevant test.

⏭️ Intentionally skipped (reasonable):

  • Pareto None-filtering: INNER JOINs + NOT NULL schema already prevent nulls; filtering would be dead code.
  • populated_conn fixture: cosmetic refactor; in-memory connections need no teardown.

Everything looks good! I'll regenerate the PR summary now to replace the timeout screen.

🐰✨

✅ Action performed

Summary regeneration triggered.

@Colinho22 Colinho22 merged commit f548501 into main Jun 3, 2026
2 checks passed
@Colinho22 Colinho22 deleted the viz-views branch June 3, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: viz views

1 participant