Skip to content

feat(upsampling) - Support upsampled error count with performance optimizations#3

Open
everettbu wants to merge 2 commits into
masterfrom
error-upsampling-race-condition
Open

feat(upsampling) - Support upsampled error count with performance optimizations#3
everettbu wants to merge 2 commits into
masterfrom
error-upsampling-race-condition

Conversation

@everettbu

@everettbu everettbu commented Jul 29, 2025

Copy link
Copy Markdown
Contributor

Test 3

yuvmen and others added 2 commits July 25, 2025 09:48
…(#94376)

Part of the Error Upsampling project:
https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link

Events-stats API will now check if all projects in the query are
allowlisted for upsampling, and convert the count query to a sum over
`sample_weight` in Snuba, this is done by defining a new SnQL function
`upsampled_count()`.

I noticed there are also eps() and epm() functions in use in this
endpoint. I considered (and even worked on) also supporting
swapping eps() and epm() which for correctness should probably also not
count naively and use `sample_weight`, but this
caused some complications and since they are only in use by specific
dashboard widgets and not available in discover
I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance
- Separate upsampling eligibility check from query transformation for better optimization
- Remove unnecessary null checks in upsampled_count() function per schema requirements
- Add cache invalidation utilities for configuration management

This improves performance during high-traffic periods by avoiding repeated
expensive allowlist lookups while maintaining data consistency.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR introduces a comprehensive error upsampling feature that transforms count() queries to upsampled_count() for allowlisted projects when querying error events. The implementation addresses the problem of inaccurate error counts when client-side sampling is applied by using sample weights to extrapolate true error volumes.

The key components added include:

  1. Core upsampling logic (src/sentry/api/helpers/error_upsampling.py) - Contains eligibility checking, query transformation, and caching mechanisms with a 60-second TTL to optimize performance
  2. Database integration - Adds the upsampled_count function to the Discover dataset that sums sample_weight columns instead of counting rows
  3. API endpoint integration - Implements upsampling in the organization events stats endpoint across three query execution paths (top events, RPC, and standard queries)
  4. Test infrastructure - Adds comprehensive test coverage and updates test factories to handle error sampling context
  5. Git submodule - Adds a new sentry-repo submodule (though the content appears empty)

The feature uses an allowlist-based approach where ALL projects in a query must be on the allowlist to enable upsampling, ensuring data consistency. The system includes dataset-specific logic to apply transformations only to error-related queries where sample weights exist, avoiding issues with transaction data.

Confidence score: 2/5

  • This PR has significant implementation concerns that make it risky to merge without addressing critical issues
  • The main issues are an empty git submodule, potential race conditions in caching logic, and assumptions about database schema without verification
  • Files needing attention: sentry-repo (empty submodule), src/sentry/api/helpers/error_upsampling.py (race condition and schema assumptions), src/sentry/api/endpoints/organization_events_stats.py (code duplication)

7 files reviewed, 6 comments

Edit Code Review Bot Settings | Greptile

)
except Exception:
pass
if client_sample_rate:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The condition if client_sample_rate: will be False for valid sample rate of 0.0. Consider using if client_sample_rate is not None: to handle zero sample rates correctly.

Suggested change
if client_sample_rate:
if client_sample_rate is not None:

[Function("sum", [Column("sample_weight")])],
alias,
),
default_result_type="number",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The default_result_type is set to "number" but the comment and typical count operations suggest this should be "integer". This could cause type inconsistencies in query results.

Comment on lines +232 to +233
if upsampling_enabled:
final_columns = transform_query_columns_for_error_upsampling(query_columns)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Transformation logic is duplicated in three places (lines 232-233, 276-277, 295-296). Consider extracting to a helper function

if column_lower == "count()":
# Transform to upsampled count - assumes sample_weight column exists
# for all events in allowlisted projects per our data model requirements
transformed_columns.append("upsampled_count() as count")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The transformation assumes upsampled_count() function exists in the database schema. This could cause query failures if the function isn't available.

Comment on lines +31 to +32
if cached_result is not None:
return cached_result and _should_apply_sample_weight_transform(dataset, request)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Caching eligibility but not the dataset/request check could lead to inconsistent behavior - eligibility might be cached as true but dataset check could return different results between calls.

Check if a query is focused on error events.
Reduced to only check for event.type:error to err on the side of caution.
"""
query = request.GET.get("query", "").lower()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Using simple string containment check for 'event.type:error' could match unintended queries like 'event.type:error_custom' or within quoted strings.

@GitHoobar

Copy link
Copy Markdown

Review Summary

🔍 Comments beyond diff scope (1)
src/sentry/api/endpoints/organization_events_stats.py (1)

87-90: flatten_results uses quadratic list summation (sum([...], [])), which causes O(n^2) time and memory usage for large result sets, severely degrading performance as data scales.
Category: performance


@mfeuerstein mfeuerstein left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review — approved

Reviewed 8 files. 0 high-severity issues found. Verdict: approved.

src/sentry/testutils/factories.py (medium)

  • Reviewed src/sentry/testutils/factories.py — looks good

pyproject.toml (low)

  • Reviewed pyproject.toml — looks good

src/sentry/search/events/datasets/discover.py (low)

  • Reviewed src/sentry/search/events/datasets/discover.py — looks good

src/sentry/api/helpers/error_upsampling.py (low)

  • Reviewed src/sentry/api/helpers/error_upsampling.py — looks good

sentry-repo (low)

  • Reviewed sentry-repo — looks good

tests/snuba/api/endpoints/test_organization_events_stats.py (low)

  • Reviewed tests/snuba/api/endpoints/test_organization_events_stats.py — looks good

tests/sentry/api/helpers/test_error_upsampling.py (low)

  • Reviewed tests/sentry/api/helpers/test_error_upsampling.py — looks good

src/sentry/api/endpoints/organization_events_stats.py (low)

  • Reviewed src/sentry/api/endpoints/organization_events_stats.py — looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants