Skip to content

COE-227: Canonical Data Store and Collection Pipeline#5

Open
kumanday wants to merge 1 commit intomainfrom
leonardogonzalez/coe-227-canonical-data-store-and-collection-pipeline
Open

COE-227: Canonical Data Store and Collection Pipeline#5
kumanday wants to merge 1 commit intomainfrom
leonardogonzalez/coe-227-canonical-data-store-and-collection-pipeline

Conversation

@kumanday
Copy link
Collaborator

Implements the canonical benchmark database, collectors, normalization, and rollups for COE-227.

Changes

  • Database schema with 9 canonical tables (providers, harness_profiles, variants, experiments, task_cards, sessions, requests, metric_rollups, artifacts)
  • Repository layer with 9 repositories for all entities
  • Session service for lifecycle management
  • LiteLLM collector and normalizer for request ingestion
  • Prometheus collector and rollup computations
  • Unit and integration test suites (57 tests total)

Test Results

All 57 tests pass (40 unit + 17 integration)

Acceptance Criteria

All 15 acceptance criteria met as documented in the Linear issue.

Implements COE-227:
- Database schema with 9 canonical tables (providers, harness_profiles,
  variants, experiments, task_cards, sessions, requests, metric_rollups,
  artifacts)
- Repository layer with 9 repositories for all entities
- Session service for lifecycle management
- LiteLLM collector and normalizer for request ingestion
- Prometheus collector and rollup computations
- Unit and integration test suites

Key design decisions:
- SQLAlchemy async ORM with PostgreSQL UUID types
- Foreign key constraints for referential integrity
- Unique constraints on session_id, proxy_key_alias, litellm_call_id
- Percentile calculations with empty list handling for NumPy 2.x

Tests: 40 unit tests + 17 integration tests all pass
@kumanday kumanday added the symphony Symphony orchestrated task label Mar 21, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 21, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 220915a2-2499-4a45-8a13-1431a77af23c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch leonardogonzalez/coe-227-canonical-data-store-and-collection-pipeline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82763a0ab8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +203 to +207
session_rollups = await self.session.execute(
select(MetricRollupModel).where(
MetricRollupModel.scope_type == RollupScopeType.SESSION,
MetricRollupModel.scope_id.in_([s.session_id for s in sessions]),
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Aggregate variant latency from requests, not session medians

Variant-level median_latency_ms/p95_latency_ms are built from one precomputed median per session, so sessions with very different request counts are weighted equally. In a common case like 100 fast requests in one session and 1 slow request in another, this reports a 50/50 blend instead of the true request-level distribution, which will skew the core variant comparisons this service is meant to produce.

Useful? React with 👍 / 👎.

Comment on lines +148 to +151
session_id=UUID(session_id) if session_id else None,
experiment_id=UUID(correlation_keys["experiment_id"]) if correlation_keys.get("experiment_id") else None,
variant_id=UUID(correlation_keys["variant_id"]) if correlation_keys.get("variant_id") else None,
provider_id=UUID(correlation_keys["provider_id"]) if correlation_keys.get("provider_id") else None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Populate experiment and variant IDs after alias-based session lookup

When a LiteLLM row is correlated only through proxy_key_alias, _resolve_session() can recover the session, but this constructor still leaves experiment_id and variant_id null because it only trusts raw tags. compute_experiment_rollups() later filters on RequestModel.experiment_id, so alias-only traffic disappears from experiment totals and latency aggregates even though the matched session already tells us which experiment/variant it belongs to.

Useful? React with 👍 / 👎.

Comment on lines +127 to +129
if await self.request_repo.exists_by_litellm_call_id(
self.session, litellm_call_id
):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make duplicate request ingestion atomic

This existence check does not make ingestion idempotent under concurrency. If two collector workers or retries ingest the same litellm_call_id at the same time, both can observe False here and then race into RequestRepository.create(), where the loser hits the unique constraint on commit and fails the collection run instead of treating the duplicate as a no-op.

Useful? React with 👍 / 👎.

]

[project.scripts]
bench = "cli.main:cli"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Point the bench script at a real CLI entrypoint

Installing this package will register bench, but the entrypoint targets cli.main:cli and this commit only adds src/cli/__init__.py; there is no src/cli/main.py or exported cli object anywhere in the repo. Anyone following the documented pip install -e . flow will get an import error before the CLI can start.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

symphony Symphony orchestrated task

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant