feat(upsampling) - Support upsampled error count with performance optimizations#18
feat(upsampling) - Support upsampled error count with performance optimizations#18ShashankFC wants to merge 2 commits into
Conversation
…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.
There was a problem hiding this comment.
Pull request overview
This PR introduces error upsampling support for organization event statistics queries, allowing sample-weighted counting of errors for allowlisted projects to improve accuracy of error metrics. The implementation includes performance optimizations through caching and selective application based on project allowlisting and query type.
Changes:
- Added error upsampling logic with caching for allowlist eligibility checks
- Implemented query column transformation to use
upsampled_count()aggregation - Added comprehensive test coverage for various upsampling scenarios
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/sentry/api/helpers/error_upsampling.py | New module implementing error upsampling logic with caching and transformation functions |
| src/sentry/api/endpoints/organization_events_stats.py | Integrated upsampling checks and column transformations into stats endpoint |
| src/sentry/search/events/datasets/discover.py | Added upsampled_count() SnQL function for sample-weighted counting |
| src/sentry/testutils/factories.py | Enhanced event storage to extract and set sample_rate from error_sampling context |
| tests/sentry/api/helpers/test_error_upsampling.py | Unit tests for upsampling helper functions |
| tests/snuba/api/endpoints/test_organization_events_stats.py | Integration tests for upsampling behavior across different scenarios |
| pyproject.toml | Added new modules to mypy type checking configuration |
Comments suppressed due to low confidence (1)
src/sentry/api/helpers/error_upsampling.py:1
- The variable
upsampling_enabledis assigned the same value asshould_upsampleon line 226 but adds no additional logic or transformation. This creates unnecessary duplication. Useshould_upsampledirectly in the conditional checks instead of creating an alias.
from collections.abc import Sequence
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| expensive repeated option lookups during high-traffic periods. This is safe | ||
| because allowlist changes are infrequent and eventual consistency is acceptable. | ||
| """ | ||
| cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}" |
There was a problem hiding this comment.
Using hash() for cache keys is unsafe because Python's default hash() is randomized across process restarts (hash randomization). This will cause cache misses across different processes or after restarts. Use a deterministic hashing function like hashlib.md5() or construct the key directly from sorted project IDs.
| This should be called when the allowlist configuration changes to ensure | ||
| cache consistency across the system. | ||
| """ | ||
| cache_key = f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}" |
There was a problem hiding this comment.
Using hash() for cache keys is unsafe due to Python's hash randomization. The cache key in invalidate_upsampling_cache() must match the key format in is_errors_query_for_error_upsampled_projects(), but hash randomization will cause them to differ across processes. Use a deterministic hashing approach instead.
| cache_key = f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}" | |
| sorted_project_ids = sorted(project_ids) | |
| project_ids_key = ",".join(str(project_id) for project_id in sorted_project_ids) | |
| cache_key = f"error_upsampling_eligible:{organization_id}:{project_ids_key}" |
|
|
||
| # Store the upsampling decision to apply later during query building | ||
| # This separation allows for better query optimization and caching | ||
| upsampling_enabled = should_upsample | ||
| final_columns = query_columns | ||
|
|
||
| if top_events > 0: | ||
| # Apply upsampling transformation just before query execution | ||
| # This late transformation ensures we use the most current schema assumptions | ||
| if upsampling_enabled: | ||
| final_columns = transform_query_columns_for_error_upsampling(query_columns) | ||
|
|
There was a problem hiding this comment.
The transformation transform_query_columns_for_error_upsampling(query_columns) is repeated identically in three separate code paths (lines 233, 277, and 296). This violates the DRY principle. Consider applying the transformation once immediately after the should_upsample check and before the branching logic.
| # Store the upsampling decision to apply later during query building | |
| # This separation allows for better query optimization and caching | |
| upsampling_enabled = should_upsample | |
| final_columns = query_columns | |
| if top_events > 0: | |
| # Apply upsampling transformation just before query execution | |
| # This late transformation ensures we use the most current schema assumptions | |
| if upsampling_enabled: | |
| final_columns = transform_query_columns_for_error_upsampling(query_columns) | |
| # Store the upsampling decision to apply later during query building | |
| # This separation allows for better query optimization and caching | |
| upsampling_enabled = should_upsample | |
| final_columns = ( | |
| transform_query_columns_for_error_upsampling(query_columns) | |
| if upsampling_enabled | |
| else query_columns | |
| ) | |
| if top_events > 0: |
Test 3
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.
Replicated from ai-code-review-evaluation/sentry-coderabbit#3