Skip to content

feat(metrics): user-defined metrics catalog + monitoring (tripl-dxhp)#27

Merged
vladenisov merged 14 commits into
mainfrom
feat/metrics-catalog
Jun 30, 2026
Merged

feat(metrics): user-defined metrics catalog + monitoring (tripl-dxhp)#27
vladenisov merged 14 commits into
mainfrom
feat/metrics-catalog

Conversation

@vladenisov

Copy link
Copy Markdown
Owner

Summary

Ships the Metrics catalog + monitoring epic (tripl-dxhp, 9/9 slices): a metric is like an event but user-defined, monitored with the same anomaly detection + alerting.

A MetricDefinition (global, project-scoped; lifecycle draft/active/archived) is one of three kinds:

  • sql — a user SELECT returning a per-bucket value (validated, no bound params)
  • fact_aggregation — count/sum/avg/min/max/count_distinct over a measure column of a table/base query, with optional filter + breakdowns
  • event_composition — single event count, ratio A÷B, or A÷distinct-users, derived from already-collected event series

…with breakdowns, app-version/platform splits, anomaly monitoring and opt-in alerting.

What's included

  • Models/migrationsMetricDefinition + 4 enums; metric_values/metric_value_breakdowns; metric anomaly scope (MetricScopeType.metric, nullable MetricAnomaly.scan_config_id, partial unique index). 3 migrations, single head a1b2c3d4e5f6.
  • Adaptersget_time_bucketed_aggregate (+breakdown) across ClickHouse/Postgres/BigQuery; measure_validator (identifier allowlist, SQL-fragment/SELECT safety, UNION ban) preserving the no-bound-params escaping model.
  • Catalog API — kind-discriminated Pydantic schemas (every identifier/SQL field gated by the validator), service (CRUD/list/bulk/reorder/move), /projects/{slug}/metrics router.
  • Collectioncollect_metric_definitions worker + composition evaluator (divide-by-zero → gap) + distinct-user denominator; beat check_metric_definitions_due.
  • Monitoring/alerting — metric anomaly recompute (count-shaped vs fractional value-kind to avoid false positives); per-project detect_metrics, per-rule include_metrics (opt-in, default off).
  • Series read — densified series + breakdown/version + catalog list enrichment (latest value/signal/spark).
  • Frontend — Metrics catalog page, kind-specific create/edit form, metric drilldown reusing the monitoring tabs, nav + routes.
  • Tests/docs — 6 end-to-end pipeline tests; docs across concepts/feature-reference/architecture/anomaly-detection/alerting.

Quality

  • Full backend suite 801 passed; frontend tsc + eslint clean, 53 vitest; mypy/ruff clean.
  • Security re-verify: user-controlled SQL injection surface closed end-to-end.
  • openapi.json regenerated (additive); graphify graph current.

Test plan

  • CI green on PG (migrations upgrade/downgrade)
  • Manual smoke: create one metric per kind, verify collection populates series + anomaly flags
  • Verify an alert rule with include_metrics delivers a metric anomaly

Follow-ups (filed)

  • Fractional-metric anomaly sensitivity (int detector)
  • Catalog UI bulk actions + drag-reorder + edit-mode config visibility
  • Explicit per-composition interval + value-column conventions

vladenisov and others added 14 commits June 29, 2026 15:38
First foundational piece of the metrics catalog + monitoring epic (tripl-dxhp.1).

MetricDefinition is a global, project-scoped catalog entity (no branch_id),
mirroring EventType/Event, with a simple draft/active/archived lifecycle and a
hybrid collection binding: sql/fact_aggregation carry their own data_source_id +
interval; event_composition references canonical events via numerator/denominator
FKs. Adds MetricKind/MetricStatus/MetricAggregation/MetricComposition enums and
migration d6e7f8a9b0c1 (4 native PG enum types + metric_definitions table).

5 model tests; mypy + ruff clean; single alembic head; revision-graph test passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Metrics epic (tripl-dxhp) backend slices .2/.3/.4:

- .2 value storage: MetricValue + MetricValueBreakdown models (metric-scoped,
  Float value, optional scan_config_id grid alignment), dialect-aware UPSERT and
  window-delete helpers in metric_rows.py, migration e9f0a1b2c3d4.
- .3 adapters: get_time_bucketed_aggregate (+breakdown) across ClickHouse,
  Postgres and BigQuery (count/sum/avg/min/max/count_distinct); new
  measure_validator (identifier allowlist, SQL-fragment/SELECT safety, UNION ban,
  paren-depth FROM detection) preserving the no-bound-params escaping model;
  ClickHouse interval validation on new and pre-existing count methods. Existing
  count path SQL unchanged.
- .4 catalog API: kind-discriminated Pydantic schemas with every identifier/SQL
  field gated through measure_validator at the boundary; service (CRUD, list,
  bulk, reorder, move) and thin /projects/{slug}/metrics router (EditorUserDep +
  audit).

openapi.json regenerated (additive: 5 routes, 16 schemas). 255 tests pass; mypy
and ruff clean. Security re-verify: user-controlled SQL injection surface closed.
Metrics epic (tripl-dxhp) wave 1 — slices .5 and .7:

- .5 collection: collect_metric_definitions worker task. fact/sql kinds run via
  the M3 adapter aggregations over the metric data source on interval-aligned
  chunks (window-delete-then-upsert into metric_values/breakdowns); event_composition
  uses a pure evaluator (worker/analyzers/metric_composition) that densifies
  numerator/denominator onto the shared scan grid, divide-by-zero -> None,
  per_distinct_user denominator via warehouse count_distinct. Beat dispatch
  check_metric_definitions_due with a dedicated advisory lock + stale-running reap;
  inline last_collected_at/status/error.
- .7 series read: metric_series_service (densify + anomaly join + forecast, reusing
  metrics_service helpers read-only) exposing GET series/breakdowns/versions under
  /projects/{slug}/metrics; catalog list enriched with latest value + signal + spark
  via batched window-function queries. Float-shaped response models preserve <1 ratios.

openapi.json regenerated (additive: 3 routes, 6 schemas, enriched list item).
278 tests pass; mypy + ruff clean.

Carried to .6 (value-kind work): metric_values.value is NOT NULL so divide-by-zero
buckets are skipped; .6 should add a value-kind flag (nullable value + null-aware
zero-fill) and the metric anomaly scope (scope_ref=metric_definition_id, plus a
(scope_ref,bucket) index).
…/detail (.8)

Metrics epic (tripl-dxhp) wave 2 — slices .6 and .8:

- .6 anomaly scope + alerting: MetricScopeType += metric (migration a1b2c3d4e5f6:
  ALTER TYPE in an autocommit block, a partial unique index on
  (scope_type,scope_ref,bucket) WHERE scan_config_id IS NULL, nullable
  MetricAnomaly.scan_config_id + (scope_ref,bucket) index, project_anomaly_settings
  .detect_metrics, alert_rules.include_metrics). detect.py recomputes metric-scope
  anomalies (project-global, NULL scan_config_id, idempotent via the partial index);
  a value-kind helper (is_count_shaped) gates zero-fill / min_expected_count so
  ratio/avg metrics don't false-fire, and the series densify renders fractional gaps
  as null. Scope threaded through signals/dispatch/alert_payload/alerting_matching/
  get_active_signals; include_metrics gates delivery (default OFF); per-project
  canonical AlertRuleState avoids multi-scan-config duplicate deliveries.
- .8 frontend: metricsCatalogApi + types, MetricsPage (MiniStat + table, latest
  value/signal + sparkline, filters), MetricForm with kind-specific config sections
  and per-kind validation mirroring the backend discriminated union (kind/config
  immutable on edit), metric drilldown via a new 'metric' MonitoringScope reusing
  MonitoringDetailPage, nav item + lazy routes. Anomaly row visuals gated on signal
  state.

Integration: regenerated openapi.json (additive: include_metrics/detect_metrics,
metric enum value, nullable MetricSignalResponse.scan_config_id) and frontend
api.gen.ts; tightened the catalog list-enrichment anomaly query to filter
scope_type='metric'. Full backend suite 801 passed; frontend tsc + eslint clean,
53 vitest pass.
Metrics epic (tripl-dxhp) wave 3 — slice .9:

- backend/src/tripl/tests/test_metrics_pipeline_e2e.py: 6 end-to-end tests driving
  definition -> collection -> anomaly recompute -> alert dispatch for every kind
  (fact_aggregation, sql, event_composition single/ratio/per_distinct_user),
  including the ratio divide-by-zero gap and the include_metrics opt-in gate
  (delivers when True, none when False). Green.
- docs: documented Metrics as a first-class concept alongside Events across
  concepts.md, feature-reference.md, architecture.md, anomaly-detection.md and
  alerting.md (three kinds, global/project-scoped lifecycle, collection +
  scheduling, count-shaped vs fractional value-kind, detect_metrics / include_metrics).

Browser E2E (Playwright) deferred: the repo has no Playwright harness and there is
no live stack in this environment; the catalog list/form/drilldown flow is covered
by the wave-2 vitest suite + MonitoringDetailPage tests. No API drift (openapi
regenerated, unchanged).
Fact-tables epic (tripl-ysji) wave 1 — slices .1 and .6:

- .1 FactTable foundation: new FactTable model (project-scoped: data_source_id,
  sql SELECT, timestamp_column, columns/identifier_columns/row_filters JSON),
  'fact' added to the MetricKind enum, metric_definitions.fact_table_id FK,
  migration a2b3c4d5e6f7 (ALTER TYPE ADD VALUE in an autocommit block + fact_tables
  table + FK; single head). The collect task dispatch now fails with a clear
  NotImplementedError for kinds without a collector instead of a cryptic KeyError.
  (Fact tables are defined separately; metrics will be built on top of them in
  later slices. The legacy inline fact_aggregation kind is being removed in .4.)
- .6 metrics UI polish: metric drilldown shows "Back to metrics" + a Metrics
  breadcrumb (was Monitors/Back to events); MetricForm clears stale validation on
  kind change; the misleading fractional-metric forecast tail is dropped for
  catalog-metric series; lazy metric routes get a Suspense fallback so they no
  longer flash the previous page.

openapi.json + frontend api.gen.ts regenerated (additive: MetricKind gained
'fact'). Backend ruff/mypy + 26 tests + contract test green; frontend tsc/eslint +
43 vitest green.
…(ysji.2/.3)

Wave 2 of the fact-tables epic:
- Introspection service: runs a fact-table SELECT via the warehouse adapter,
  buckets column types (number/string/bool/timestamp), derives identifier
  candidates, returns JSON-safe sample rows. Scope-checks the data source to
  the project (via ScanConfig), re-validates SELECT safety as defense in depth,
  buckets Postgres interval as string, coerces non-finite floats to null, and
  redacts driver exceptions from the WARNING log tier.
- FactTable CRUD: kind-mirrored schemas (SQL/fragment/identifier validated via
  measure_validator), service (409 on duplicate name, project-scoped data
  source binding, append-at-end ordering), router /projects/{slug}/fact-tables
  with create/list/get/patch/delete + POST /preview.
- openapi.json regenerated (additive: 3 paths, FactTable* schemas).

Gates: ruff + mypy clean (248 files), 71 fact-table tests + openapi contract
pass. Review: 2 HIGH (order-null 500, cross-project source binding) + 4 MEDIUM
fixed; 1 MEDIUM deferred to ysji.4 (column-name validation belongs at the
metric measure/distinct column gate where it reaches SQL).
…ggregation (ysji.4)

Wave 3 of the fact-tables epic.

ADD 'fact' kind: a metric built on a FactTable.
- Schema FactMetricCreate (single: fact_table_id + aggregation + measure/distinct/
  row_filter; ratio: numerator/denominator FactOperands, denominator may use a
  different fact table).
- Service validation: fact-table existence + project ownership; measure_column/
  distinct_column must pass validate_identifier AND be a column of the referenced
  fact table; row_filter must name a stored filter; per-aggregation required-field
  rules; data source comes from the FactTable.
- Collection _collect_fact (single/ratio) reuses the adapter
  get_time_bucketed_aggregate(+breakdown); ratio divides via evaluate_composition
  (divide-by-zero -> gap). Value-kind: count/count_distinct count-shaped;
  sum/avg/min/max and ratio fractional.

REMOVE fact_aggregation entirely (owner request, no back-compat):
- Drop MetricKind.fact_aggregation via PG enum type-recreation migration
  b3c4d5e6f7a8 (single head after a2b3c4d5e6f7; pre-flight DELETE makes the USING
  cast safe; restartable via DROP TYPE IF EXISTS; PG-guarded).
- Delete FactAggregationConfig/MetricCreate, the collector, all docstrings, and
  the frontend surface; regenerate openapi.json + api.gen.ts.

Hardening (review findings): row-filter fragments now reject SELECT/WITH
subqueries; runtime fact-table project-scope check in collection; empty-columns
guard; asserts -> explicit raises; idempotent enum migration; loud FE guard for
the deferred fact form.

Gates: ruff + mypy clean (248 files), 221 backend tests + openapi contract +
single alembic head, frontend tsc/eslint/vitest green. fact_aggregation: 0 refs.
Wave 4 (final epic slice), frontend.

- factTablesApi + src/types/factTables.ts (aliasing the generated FactTable*
  schemas) mirroring the metrics-catalog client/types.
- FactTables page: list + create/edit form with a SQL editor, a 'Preview
  columns' action (POST /fact-tables/preview) that renders sampled columns/types
  and persists them into the create payload, and a named row-filters editor.
- Rich 'fact' metric form: fact-table picker -> column/aggregation/filter
  dropdowns; composition toggle single vs ratio (numerator/denominator operands,
  each may reference a different fact table); client validation mirrors the
  backend required-field rules. Removes the wave-3 placeholder.
- Routes (/p/:slug/fact-tables[/new|/:id/edit]) + 'Fact tables' nav item +
  breadcrumb.

Quality: a11y (accessible row-filter labels, aria-required, no double tab stop),
stable list keys, exhaustive kind dispatch, type-guarded aggregation parsing.
Gates: tsc + eslint clean, 22 vitest tests. Epic tripl-ysji complete (6/6).
Extend create_demo_project to showcase the metrics catalog out of the box:
- an 'orders' FactTable (synthetic read-only SELECT, 6 introspected columns,
  a named 'completed' row filter);
- four MetricDefinitions, one per kind: a sql metric (Active Sessions), an
  event_composition ratio (Purchase conversion), a fact single (Revenue
  completed = sum(amount) filtered), and a fact ratio (Average order value =
  sum / count);
- fabricated MetricValue series (7d hourly for the ratio on the scan grid;
  30d daily for the sql/fact metrics) so the catalog and metric drilldowns
  render with data without the worker ever running.

Definitions are built from the Pydantic create schemas (to_create_values) so
all validators run and the config JSON is exact, then persisted as ORM rows to
preserve the seeder's single end-of-function commit.

Tests: 2 new demo assertions (catalog returns >=4 defs covering all kinds with
a spark; fact table exposes its named filter). ruff + mypy clean, 8 demo tests
pass.
The event_composition ratio was seeded as purchases/screen-views where both
use the same sinusoid base, so the ratio cancelled to a near-flat constant and
the catalog sparkline read as a dead line. Seed a gentle upward trend + daily
ripple instead (~0.04-0.11) so the demo ratio renders as a live series.
… fact table)

Collect all fact metrics of a fact table in a single multi-aggregate query
instead of one query per metric.

Adapter contract (base.py): AggregateSpec + get_time_bucketed_multi_aggregate
and _breakdown, implemented across ClickHouse / Postgres / BigQuery. A per-metric
row filter becomes a per-aggregate CONDITIONAL aggregate (sumIf/countIf/-If;
FILTER (WHERE) / CASE) so differing filters share one scan. Empty conditional
groups are guarded to NULL (count sentinel in ClickHouse, NULLIF for count-style
in PG/BQ) so per-bucket values stay byte-identical to the per-metric path.

Collection: collect_fact_metrics_batch builds dedup'd specs per fact table,
chunks the covering window by the smallest replay_chunk_interval (disjoint-bucket
merge), runs one multi-aggregate query per fact table + one per breakdown
dimension, and assembles each metric — single, same-table ratio, and cross-table
ratio (operands gathered across fact tables in the same interval group, divided
via evaluate_composition, divide-by-zero -> gap). Per-metric upsert + isolated
error capture. The scheduler groups due fact metrics by interval into one batch
dispatch; sql/event_composition stay per-metric; per-metric _collect_fact is
kept as the manual-recollect fallback.

Gates: ruff + mypy clean (248 files), full backend suite 915 passed; 180 impacted
tests reverified. 1 CRITICAL + 3 HIGH review findings fixed (value-identity).
UX: a fact table is a reusable data definition (read-only SELECT + introspected
columns) that fact metrics are built on — a modeling primitive, not an
observation surface. Move it from Observe into Plan alongside events/variables,
and reorder Observe to Live activity / Metrics / Monitors / Anomalies / Alerting
so 'define' and 'watch' surfaces no longer interleave. resolveNavLocation derives
the breadcrumb area from the group, so Fact tables now reads 'Plan > Fact tables'.
UX: fact tables exist only to back fact metrics, so they no longer warrant a
standalone top-level nav item. Metrics becomes a tabbed shell (Catalog | Fact
tables) with URL-driven tabs (deep-linkable, back/forward works); the catalog
body is extracted to MetricsCatalog.tsx and the fact-tables list to
FactTablesList.tsx (FactTablesPage removed). The primary action is contextual
(New metric / New fact table) and the H1 stays 'Metrics'.

Routing: fact-table routes move under /metrics/fact-tables[/new|/:id/edit];
the old /fact-tables* routes redirect (param-preserving) so links/bookmarks
survive. The standalone 'Fact tables' sidebar item is removed — the Metrics
nav item matches /metrics* so it stays active on the tab; breadcrumb reads
'Observe > Metrics'. Back-links + edit links repointed; tests updated.

Gates: tsc + eslint clean, 54 tests pass across metrics/fact-tables/navigation.
@vladenisov vladenisov merged commit 997cd36 into main Jun 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant