Skip to content

feat(upsampling) - Support upsampled error count with performance optimizations#18

Open
ShashankFC wants to merge 2 commits into
masterfrom
error-upsampling-race-condition
Open

feat(upsampling) - Support upsampled error count with performance optimizations#18
ShashankFC wants to merge 2 commits into
masterfrom
error-upsampling-race-condition

Conversation

@ShashankFC

Copy link
Copy Markdown
Contributor

Test 3

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced error upsampling support for organization event statistics queries, improving accuracy of error metrics for eligible projects through sample-weighted counting.
    • Upsampling applies selectively based on project allowlisting and query type.
  • Tests

    • Added comprehensive test coverage for error upsampling across various project allowlist scenarios and event dataset types.

✏️ Tip: You can customize this high-level summary in your review settings.


Replicated from ai-code-review-evaluation/sentry-coderabbit#3

yuvmen and others added 2 commits July 25, 2025 09:48
…(#94376)

Part of the Error Upsampling project:
https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link

Events-stats API will now check if all projects in the query are
allowlisted for upsampling, and convert the count query to a sum over
`sample_weight` in Snuba, this is done by defining a new SnQL function
`upsampled_count()`.

I noticed there are also eps() and epm() functions in use in this
endpoint. I considered (and even worked on) also supporting
swapping eps() and epm() which for correctness should probably also not
count naively and use `sample_weight`, but this
caused some complications and since they are only in use by specific
dashboard widgets and not available in discover
I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance
- Separate upsampling eligibility check from query transformation for better optimization
- Remove unnecessary null checks in upsampled_count() function per schema requirements
- Add cache invalidation utilities for configuration management

This improves performance during high-traffic periods by avoiding repeated
expensive allowlist lookups while maintaining data consistency.
@ShashankFC ShashankFC requested a review from Copilot January 30, 2026 10:31

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces error upsampling support for organization event statistics queries, allowing sample-weighted counting of errors for allowlisted projects to improve accuracy of error metrics. The implementation includes performance optimizations through caching and selective application based on project allowlisting and query type.

Changes:

  • Added error upsampling logic with caching for allowlist eligibility checks
  • Implemented query column transformation to use upsampled_count() aggregation
  • Added comprehensive test coverage for various upsampling scenarios

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/sentry/api/helpers/error_upsampling.py New module implementing error upsampling logic with caching and transformation functions
src/sentry/api/endpoints/organization_events_stats.py Integrated upsampling checks and column transformations into stats endpoint
src/sentry/search/events/datasets/discover.py Added upsampled_count() SnQL function for sample-weighted counting
src/sentry/testutils/factories.py Enhanced event storage to extract and set sample_rate from error_sampling context
tests/sentry/api/helpers/test_error_upsampling.py Unit tests for upsampling helper functions
tests/snuba/api/endpoints/test_organization_events_stats.py Integration tests for upsampling behavior across different scenarios
pyproject.toml Added new modules to mypy type checking configuration
Comments suppressed due to low confidence (1)

src/sentry/api/helpers/error_upsampling.py:1

  • The variable upsampling_enabled is assigned the same value as should_upsample on line 226 but adds no additional logic or transformation. This creates unnecessary duplication. Use should_upsample directly in the conditional checks instead of creating an alias.
from collections.abc import Sequence

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

expensive repeated option lookups during high-traffic periods. This is safe
because allowlist changes are infrequent and eventual consistency is acceptable.
"""
cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"

Copilot AI Jan 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using hash() for cache keys is unsafe because Python's default hash() is randomized across process restarts (hash randomization). This will cause cache misses across different processes or after restarts. Use a deterministic hashing function like hashlib.md5() or construct the key directly from sorted project IDs.

Copilot uses AI. Check for mistakes.
This should be called when the allowlist configuration changes to ensure
cache consistency across the system.
"""
cache_key = f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}"

Copilot AI Jan 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using hash() for cache keys is unsafe due to Python's hash randomization. The cache key in invalidate_upsampling_cache() must match the key format in is_errors_query_for_error_upsampled_projects(), but hash randomization will cause them to differ across processes. Use a deterministic hashing approach instead.

Suggested change
cache_key = f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}"
sorted_project_ids = sorted(project_ids)
project_ids_key = ",".join(str(project_id) for project_id in sorted_project_ids)
cache_key = f"error_upsampling_eligible:{organization_id}:{project_ids_key}"

Copilot uses AI. Check for mistakes.
Comment on lines +223 to +234

# Store the upsampling decision to apply later during query building
# This separation allows for better query optimization and caching
upsampling_enabled = should_upsample
final_columns = query_columns

if top_events > 0:
# Apply upsampling transformation just before query execution
# This late transformation ensures we use the most current schema assumptions
if upsampling_enabled:
final_columns = transform_query_columns_for_error_upsampling(query_columns)

Copilot AI Jan 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transformation transform_query_columns_for_error_upsampling(query_columns) is repeated identically in three separate code paths (lines 233, 277, and 296). This violates the DRY principle. Consider applying the transformation once immediately after the should_upsample check and before the branching logic.

Suggested change
# Store the upsampling decision to apply later during query building
# This separation allows for better query optimization and caching
upsampling_enabled = should_upsample
final_columns = query_columns
if top_events > 0:
# Apply upsampling transformation just before query execution
# This late transformation ensures we use the most current schema assumptions
if upsampling_enabled:
final_columns = transform_query_columns_for_error_upsampling(query_columns)
# Store the upsampling decision to apply later during query building
# This separation allows for better query optimization and caching
upsampling_enabled = should_upsample
final_columns = (
transform_query_columns_for_error_upsampling(query_columns)
if upsampling_enabled
else query_columns
)
if top_events > 0:

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants