feat(upsampling) - Support upsampled error count with performance optimizations#3
feat(upsampling) - Support upsampled error count with performance optimizations#3everettbu wants to merge 2 commits into
Conversation
…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.
There was a problem hiding this comment.
Greptile Summary
This PR introduces a comprehensive error upsampling feature that transforms count() queries to upsampled_count() for allowlisted projects when querying error events. The implementation addresses the problem of inaccurate error counts when client-side sampling is applied by using sample weights to extrapolate true error volumes.
The key components added include:
- Core upsampling logic (
src/sentry/api/helpers/error_upsampling.py) - Contains eligibility checking, query transformation, and caching mechanisms with a 60-second TTL to optimize performance - Database integration - Adds the
upsampled_countfunction to the Discover dataset that sumssample_weightcolumns instead of counting rows - API endpoint integration - Implements upsampling in the organization events stats endpoint across three query execution paths (top events, RPC, and standard queries)
- Test infrastructure - Adds comprehensive test coverage and updates test factories to handle error sampling context
- Git submodule - Adds a new
sentry-reposubmodule (though the content appears empty)
The feature uses an allowlist-based approach where ALL projects in a query must be on the allowlist to enable upsampling, ensuring data consistency. The system includes dataset-specific logic to apply transformations only to error-related queries where sample weights exist, avoiding issues with transaction data.
Confidence score: 2/5
- This PR has significant implementation concerns that make it risky to merge without addressing critical issues
- The main issues are an empty git submodule, potential race conditions in caching logic, and assumptions about database schema without verification
- Files needing attention:
sentry-repo(empty submodule),src/sentry/api/helpers/error_upsampling.py(race condition and schema assumptions),src/sentry/api/endpoints/organization_events_stats.py(code duplication)
7 files reviewed, 6 comments
| ) | ||
| except Exception: | ||
| pass | ||
| if client_sample_rate: |
There was a problem hiding this comment.
logic: The condition if client_sample_rate: will be False for valid sample rate of 0.0. Consider using if client_sample_rate is not None: to handle zero sample rates correctly.
| if client_sample_rate: | |
| if client_sample_rate is not None: |
| [Function("sum", [Column("sample_weight")])], | ||
| alias, | ||
| ), | ||
| default_result_type="number", |
There was a problem hiding this comment.
logic: The default_result_type is set to "number" but the comment and typical count operations suggest this should be "integer". This could cause type inconsistencies in query results.
| if upsampling_enabled: | ||
| final_columns = transform_query_columns_for_error_upsampling(query_columns) |
There was a problem hiding this comment.
style: Transformation logic is duplicated in three places (lines 232-233, 276-277, 295-296). Consider extracting to a helper function
| if column_lower == "count()": | ||
| # Transform to upsampled count - assumes sample_weight column exists | ||
| # for all events in allowlisted projects per our data model requirements | ||
| transformed_columns.append("upsampled_count() as count") |
There was a problem hiding this comment.
logic: The transformation assumes upsampled_count() function exists in the database schema. This could cause query failures if the function isn't available.
| if cached_result is not None: | ||
| return cached_result and _should_apply_sample_weight_transform(dataset, request) |
There was a problem hiding this comment.
logic: Caching eligibility but not the dataset/request check could lead to inconsistent behavior - eligibility might be cached as true but dataset check could return different results between calls.
| Check if a query is focused on error events. | ||
| Reduced to only check for event.type:error to err on the side of caution. | ||
| """ | ||
| query = request.GET.get("query", "").lower() |
There was a problem hiding this comment.
style: Using simple string containment check for 'event.type:error' could match unintended queries like 'event.type:error_custom' or within quoted strings.
Review Summary🔍 Comments beyond diff scope (1)
|
mfeuerstein
left a comment
There was a problem hiding this comment.
PR Review — approved
Reviewed 8 files. 0 high-severity issues found. Verdict: approved.
src/sentry/testutils/factories.py (medium)
- Reviewed src/sentry/testutils/factories.py — looks good
pyproject.toml (low)
- Reviewed pyproject.toml — looks good
src/sentry/search/events/datasets/discover.py (low)
- Reviewed src/sentry/search/events/datasets/discover.py — looks good
src/sentry/api/helpers/error_upsampling.py (low)
- Reviewed src/sentry/api/helpers/error_upsampling.py — looks good
sentry-repo (low)
- Reviewed sentry-repo — looks good
tests/snuba/api/endpoints/test_organization_events_stats.py (low)
- Reviewed tests/snuba/api/endpoints/test_organization_events_stats.py — looks good
tests/sentry/api/helpers/test_error_upsampling.py (low)
- Reviewed tests/sentry/api/helpers/test_error_upsampling.py — looks good
src/sentry/api/endpoints/organization_events_stats.py (low)
- Reviewed src/sentry/api/endpoints/organization_events_stats.py — looks good
Test 3