feat(upsampling) - Support upsampled error count with performance optimizations by ShashankFC · Pull Request #18 · AI-Code-Review-Evals/copilot-sentry

ShashankFC · 2026-01-22T15:08:11Z

Test 3

Summary by CodeRabbit

Release Notes

New Features
- Introduced error upsampling support for organization event statistics queries, improving accuracy of error metrics for eligible projects through sample-weighted counting.
- Upsampling applies selectively based on project allowlisting and query type.
Tests
- Added comprehensive test coverage for error upsampling across various project allowlist scenarios and event dataset types.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Replicated from ai-code-review-evaluation/sentry-coderabbit#3

…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.

- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.

Copilot

Pull request overview

This PR introduces error upsampling support for organization event statistics queries, allowing sample-weighted counting of errors for allowlisted projects to improve accuracy of error metrics. The implementation includes performance optimizations through caching and selective application based on project allowlisting and query type.

Changes:

Added error upsampling logic with caching for allowlist eligibility checks
Implemented query column transformation to use upsampled_count() aggregation
Added comprehensive test coverage for various upsampling scenarios

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/sentry/api/helpers/error_upsampling.py	New module implementing error upsampling logic with caching and transformation functions
src/sentry/api/endpoints/organization_events_stats.py	Integrated upsampling checks and column transformations into stats endpoint
src/sentry/search/events/datasets/discover.py	Added `upsampled_count()` SnQL function for sample-weighted counting
src/sentry/testutils/factories.py	Enhanced event storage to extract and set sample_rate from error_sampling context
tests/sentry/api/helpers/test_error_upsampling.py	Unit tests for upsampling helper functions
tests/snuba/api/endpoints/test_organization_events_stats.py	Integration tests for upsampling behavior across different scenarios
pyproject.toml	Added new modules to mypy type checking configuration

Comments suppressed due to low confidence (1)

src/sentry/api/helpers/error_upsampling.py:1

The variable upsampling_enabled is assigned the same value as should_upsample on line 226 but adds no additional logic or transformation. This creates unnecessary duplication. Use should_upsample directly in the conditional checks instead of creating an alias.

from collections.abc import Sequence

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-30T10:32:28Z

+    expensive repeated option lookups during high-traffic periods. This is safe
+    because allowlist changes are infrequent and eventual consistency is acceptable.
+    """
+    cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"


Using hash() for cache keys is unsafe because Python's default hash() is randomized across process restarts (hash randomization). This will cause cache misses across different processes or after restarts. Use a deterministic hashing function like hashlib.md5() or construct the key directly from sorted project IDs.

Copilot · 2026-01-30T10:32:29Z

+    This should be called when the allowlist configuration changes to ensure
+    cache consistency across the system.
+    """
+    cache_key = f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}"


Using hash() for cache keys is unsafe due to Python's hash randomization. The cache key in invalidate_upsampling_cache() must match the key format in is_errors_query_for_error_upsampled_projects(), but hash randomization will cause them to differ across processes. Use a deterministic hashing approach instead.

Suggested change

cache_key = f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}"

sorted_project_ids = sorted(project_ids)

project_ids_key = ",".join(str(project_id) for project_id in sorted_project_ids)

cache_key = f"error_upsampling_eligible:{organization_id}:{project_ids_key}"

Copilot · 2026-01-30T10:32:29Z

+
+            # Store the upsampling decision to apply later during query building
+            # This separation allows for better query optimization and caching
+            upsampling_enabled = should_upsample
+            final_columns = query_columns
+
            if top_events > 0:
+                # Apply upsampling transformation just before query execution
+                # This late transformation ensures we use the most current schema assumptions
+                if upsampling_enabled:
+                    final_columns = transform_query_columns_for_error_upsampling(query_columns)
+


The transformation transform_query_columns_for_error_upsampling(query_columns) is repeated identically in three separate code paths (lines 233, 277, and 296). This violates the DRY principle. Consider applying the transformation once immediately after the should_upsample check and before the branching logic.

Suggested change

# Store the upsampling decision to apply later during query building

# This separation allows for better query optimization and caching

upsampling_enabled = should_upsample

final_columns = query_columns

if top_events > 0:

# Apply upsampling transformation just before query execution

# This late transformation ensures we use the most current schema assumptions

if upsampling_enabled:

final_columns = transform_query_columns_for_error_upsampling(query_columns)

# Store the upsampling decision to apply later during query building

# This separation allows for better query optimization and caching

upsampling_enabled = should_upsample

final_columns = (

transform_query_columns_for_error_upsampling(query_columns)

if upsampling_enabled

else query_columns

)

if top_events > 0:

yuvmen and others added 2 commits July 25, 2025 09:48

ShashankFC requested a review from Copilot January 30, 2026 10:31

Copilot AI reviewed Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(upsampling) - Support upsampled error count with performance optimizations#18

feat(upsampling) - Support upsampled error count with performance optimizations#18
ShashankFC wants to merge 2 commits into
masterfrom
error-upsampling-race-condition

ShashankFC commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    cache_key = f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}"
+    sorted_project_ids = sorted(project_ids)
+    project_ids_key = ",".join(str(project_id) for project_id in sorted_project_ids)
+    cache_key = f"error_upsampling_eligible:{organization_id}:{project_ids_key}"

Conversation

ShashankFC commented Jan 22, 2026

Summary by CodeRabbit

Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants