feat(upsampling) - Support upsampled error count with performance optimizations by everettbu · Pull Request #3 · ai-code-review-evaluation/sentry-greptile

everettbu · 2025-07-29T00:30:32Z

Test 3

…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.

- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.

greptile-apps

Greptile Summary

This PR introduces a comprehensive error upsampling feature that transforms count() queries to upsampled_count() for allowlisted projects when querying error events. The implementation addresses the problem of inaccurate error counts when client-side sampling is applied by using sample weights to extrapolate true error volumes.

The key components added include:

Core upsampling logic (src/sentry/api/helpers/error_upsampling.py) - Contains eligibility checking, query transformation, and caching mechanisms with a 60-second TTL to optimize performance
Database integration - Adds the upsampled_count function to the Discover dataset that sums sample_weight columns instead of counting rows
API endpoint integration - Implements upsampling in the organization events stats endpoint across three query execution paths (top events, RPC, and standard queries)
Test infrastructure - Adds comprehensive test coverage and updates test factories to handle error sampling context
Git submodule - Adds a new sentry-repo submodule (though the content appears empty)

The feature uses an allowlist-based approach where ALL projects in a query must be on the allowlist to enable upsampling, ensuring data consistency. The system includes dataset-specific logic to apply transformations only to error-related queries where sample weights exist, avoiding issues with transaction data.

Confidence score: 2/5

This PR has significant implementation concerns that make it risky to merge without addressing critical issues
The main issues are an empty git submodule, potential race conditions in caching logic, and assumptions about database schema without verification
Files needing attention: sentry-repo (empty submodule), src/sentry/api/helpers/error_upsampling.py (race condition and schema assumptions), src/sentry/api/endpoints/organization_events_stats.py (code duplication)

_{7 files reviewed, 6 comments}

_{Edit Code Review Bot Settings | Greptile}

greptile-apps · 2025-07-29T15:09:19Z

+        )
+    except Exception:
+        pass
+    if client_sample_rate:


logic: The condition if client_sample_rate: will be False for valid sample rate of 0.0. Consider using if client_sample_rate is not None: to handle zero sample rates correctly.

Suggested change

if client_sample_rate:

if client_sample_rate is not None:

greptile-apps · 2025-07-29T15:09:36Z

+                        [Function("sum", [Column("sample_weight")])],
+                        alias,
+                    ),
+                    default_result_type="number",


logic: The default_result_type is set to "number" but the comment and typical count operations suggest this should be "integer". This could cause type inconsistencies in query results.

greptile-apps · 2025-07-29T15:09:40Z

+                if upsampling_enabled:
+                    final_columns = transform_query_columns_for_error_upsampling(query_columns)


style: Transformation logic is duplicated in three places (lines 232-233, 276-277, 295-296). Consider extracting to a helper function

greptile-apps · 2025-07-29T15:09:55Z

+        if column_lower == "count()":
+            # Transform to upsampled count - assumes sample_weight column exists
+            # for all events in allowlisted projects per our data model requirements
+            transformed_columns.append("upsampled_count() as count")


logic: The transformation assumes upsampled_count() function exists in the database schema. This could cause query failures if the function isn't available.

greptile-apps · 2025-07-29T15:09:55Z

+    if cached_result is not None:
+        return cached_result and _should_apply_sample_weight_transform(dataset, request)


logic: Caching eligibility but not the dataset/request check could lead to inconsistent behavior - eligibility might be cached as true but dataset check could return different results between calls.

greptile-apps · 2025-07-29T15:09:56Z

+    Check if a query is focused on error events.
+    Reduced to only check for event.type:error to err on the side of caution.
+    """
+    query = request.GET.get("query", "").lower()


style: Using simple string containment check for 'event.type:error' could match unintended queries like 'event.type:error_custom' or within quoted strings.

GitHoobar · 2025-12-24T09:31:03Z

Review Summary

🔍 Comments beyond diff scope (1)

src/sentry/api/endpoints/organization_events_stats.py (1)

87-90: flatten_results uses quadratic list summation (sum([...], [])), which causes O(n^2) time and memory usage for large result sets, severely degrading performance as data scales.
Category: performance

mfeuerstein

PR Review — approved

Reviewed 8 files. 0 high-severity issues found. Verdict: approved.

`src/sentry/testutils/factories.py` (medium)

Reviewed src/sentry/testutils/factories.py — looks good

`pyproject.toml` (low)

Reviewed pyproject.toml — looks good

`src/sentry/search/events/datasets/discover.py` (low)

Reviewed src/sentry/search/events/datasets/discover.py — looks good

`src/sentry/api/helpers/error_upsampling.py` (low)

Reviewed src/sentry/api/helpers/error_upsampling.py — looks good

`sentry-repo` (low)

Reviewed sentry-repo — looks good

`tests/snuba/api/endpoints/test_organization_events_stats.py` (low)

Reviewed tests/snuba/api/endpoints/test_organization_events_stats.py — looks good

`tests/sentry/api/helpers/test_error_upsampling.py` (low)

Reviewed tests/sentry/api/helpers/test_error_upsampling.py — looks good

`src/sentry/api/endpoints/organization_events_stats.py` (low)

Reviewed src/sentry/api/endpoints/organization_events_stats.py — looks good

yuvmen and others added 2 commits July 25, 2025 09:48

greptile-apps Bot reviewed Jul 29, 2025

View reviewed changes

This was referenced Dec 31, 2025

feat(upsampling) - Support upsampled error count with performance optimizations AI-Code-Review-Evals/claude_code-sentry#3

Draft

feat(upsampling) - Support upsampled error count with performance optimizations AI-Code-Review-Evals/greptile-sentry#9

Open

wangyue6761 mentioned this pull request Jan 7, 2026

feat(upsampling) - Support upsampled error count with performance optimizations wangyue6761/sentry-greptile#3

Open

glittering-universe mentioned this pull request Jan 9, 2026

feat(upsampling) - Support upsampled error count with performance optimizations glittering-universe/sentry-costrict#3

Open

crcde8ug mentioned this pull request Mar 24, 2026

feat(upsampling) - Support upsampled error count with performance optimizations crcde8ug/sentry-greptile#3

Open

leehopper mentioned this pull request Apr 7, 2026

feat(upsampling) - Support upsampled error count with performance optimizations Code-Review-Assessment-2026/test_subset_6__sentry-greptile__claude__PR3__20260407#1

Open

mfeuerstein reviewed Apr 10, 2026

View reviewed changes

khaliqgant mentioned this pull request May 15, 2026

[MSD benchmark] feat(upsampling) - Support upsampled error count with performance optimizations MSD-Benchmarks/sentry-greptile#3

Open

amznUjjwalpa mentioned this pull request May 21, 2026

feat(upsampling) - Support upsampled error count with performance optimizations CloudSmith-Agent-Benchmark/sentry-greptile__PR3#1

Open

amznUjjwalpa mentioned this pull request Jun 9, 2026

feat(upsampling) - Support upsampled error count with performance optimizations CloudSmith-Release-Safety/sentry-greptile__PR3__PR1#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(upsampling) - Support upsampled error count with performance optimizations#3

feat(upsampling) - Support upsampled error count with performance optimizations#3
everettbu wants to merge 2 commits into
masterfrom
error-upsampling-race-condition

everettbu commented Jul 29, 2025 •

edited

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot Jul 29, 2025

Uh oh!

greptile-apps Bot Jul 29, 2025

Uh oh!

greptile-apps Bot Jul 29, 2025

Uh oh!

greptile-apps Bot Jul 29, 2025

Uh oh!

greptile-apps Bot Jul 29, 2025

Uh oh!

greptile-apps Bot Jul 29, 2025

Uh oh!

GitHoobar commented Dec 24, 2025

Uh oh!

mfeuerstein left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if upsampling_enabled:
		final_columns = transform_query_columns_for_error_upsampling(query_columns)

		if cached_result is not None:
		return cached_result and _should_apply_sample_weight_transform(dataset, request)

Conversation

everettbu commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 2/5

Uh oh!

greptile-apps Bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

GitHoobar commented Dec 24, 2025

Review Summary

Uh oh!

mfeuerstein left a comment

Choose a reason for hiding this comment

PR Review — approved

src/sentry/testutils/factories.py (medium)

pyproject.toml (low)

src/sentry/search/events/datasets/discover.py (low)

src/sentry/api/helpers/error_upsampling.py (low)

sentry-repo (low)

tests/snuba/api/endpoints/test_organization_events_stats.py (low)

tests/sentry/api/helpers/test_error_upsampling.py (low)

src/sentry/api/endpoints/organization_events_stats.py (low)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

everettbu commented Jul 29, 2025 •

edited

Loading

`src/sentry/testutils/factories.py` (medium)

`pyproject.toml` (low)

`src/sentry/search/events/datasets/discover.py` (low)

`src/sentry/api/helpers/error_upsampling.py` (low)

`sentry-repo` (low)

`tests/snuba/api/endpoints/test_organization_events_stats.py` (low)

`tests/sentry/api/helpers/test_error_upsampling.py` (low)

`src/sentry/api/endpoints/organization_events_stats.py` (low)