feat(upsampling) - Support upsampled error count with performance optimizations#8
feat(upsampling) - Support upsampled error count with performance optimizations#8ShashankFC wants to merge 2 commits into
Conversation
…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.
|
@CodeAnt-AI: review |
|
CodeAnt AI is running the review. |
Nitpicks 🔍
|
| # Early upsampling eligibility check for performance optimization | ||
| # This cached result ensures consistent behavior across query execution | ||
| should_upsample = is_errors_query_for_error_upsampled_projects( | ||
| snuba_params, organization, dataset, request |
There was a problem hiding this comment.
Suggestion: The upsampling eligibility check currently always uses the outer dataset value instead of the per-call scoped_dataset, so in dashboard widget split flows (where _get_event_stats is invoked with different datasets like discover for error-only queries) error upsampling may incorrectly be disabled or enabled for the actual query being run; this should use scoped_dataset so the decision matches the dataset of each query execution. [logic error]
Severity Level: Critical 🚨
- ❌ Error dashboards undercount events for upsampled error projects.
- ⚠️ `sentry-api-0-organization-events-stats` misreports metricsEnhanced error series.
- ⚠️ Dashboard widgets using metricsEnhanced errors show inconsistent counts.| snuba_params, organization, dataset, request | |
| snuba_params, organization, scoped_dataset, request |
Steps of Reproduction ✅
1. Trigger the metrics-enhanced events stats endpoint using the existing tests at
`tests/snuba/api/endpoints/test_organization_events_stats_mep.py`, for example
`OrganizationEventsStatsMetricsEnhancedPerformanceEndpointTest.test_split_decision_for_top_events_errors_widget`
(around lines 859–883), which issues a GET to the `sentry-api-0-organization-events-stats`
endpoint with `dataset=metricsEnhanced`, `topEvents=5`, and an error-focused query
(`field=[\"count()\",\"error.type\"]`, `query=\"error.type:test_error\"`) and passes
`dashboardWidgetId=<widget id>`.
2. The request is handled by `OrganizationEventsStatsEndpoint.get` in
`src/sentry/api/endpoints/organization_events_stats.py` (lines 105–195). It resolves
`dataset` to the metrics-enhanced dataset (e.g. `metrics_enhanced_performance`), sets
`use_rpc`/`use_metrics` flags, then constructs `get_event_stats =
get_event_stats_factory(dataset)` at line 499, closing over this outer `dataset` variable.
3. Inside `get_event_stats_factory` (lines 326–497), because `metrics_enhanced` is true
and `dashboardWidgetId` is provided, the inner `fn` (lines 335–495) takes the dashboard
widget path: it computes `errors_only_query` and calls `_get_event_stats(discover,
query_columns, errors_only_query, ...)` at lines 382–394, i.e. with `scoped_dataset` equal
to the `discover` dataset module while the closed-over `dataset` remains the
metrics-enhanced dataset module.
4. In `_get_event_stats` (definition at lines 209–324), when invoked for this error-only
query, it runs the existing code at lines 218–222: `should_upsample =
is_errors_query_for_error_upsampled_projects(snuba_params, organization, dataset,
request)`. The helper `is_errors_query_for_error_upsampled_projects` (in
`src/sentry/api/helpers/error_upsampling.py:13–40`) calls
`_should_apply_sample_weight_transform(dataset, request)` (line 32), which checks the
*dataset module* (lines 103–127) and only returns true for `errors` or an error-focused
`discover` query. Because `_get_event_stats` passes the outer `dataset`
(metrics-enhanced), not the per-call `scoped_dataset` (`discover`),
`_should_apply_sample_weight_transform` returns False even when the actual query is
against `discover` with an error-only filter and all projects are allowlisted for
upsampling, so `should_upsample` is False and upsampling is incorrectly disabled for these
split error queries.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/sentry/api/endpoints/organization_events_stats.py
**Line:** 221:221
**Comment:**
*Logic Error: The upsampling eligibility check currently always uses the outer `dataset` value instead of the per-call `scoped_dataset`, so in dashboard widget split flows (where `_get_event_stats` is invoked with different datasets like `discover` for error-only queries) error upsampling may incorrectly be disabled or enabled for the actual query being run; this should use `scoped_dataset` so the decision matches the dataset of each query execution.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished running the review. |
User description
Test 3
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.
Replicated from ai-code-review-evaluation/sentry-coderabbit#3
CodeAnt-AI Description
Support error upsampling in organization event statistics with cached eligibility checks
What Changed
Impact
✅ Accurate error counts for allowlisted sampled events✅ Unchanged counts when not all projects are allowlisted✅ Lower overhead for repeated allowlist checks during high-traffic queries💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.