feat(upsampling) - Support upsampled error count with performance optimizations by ShashankFC · Pull Request #8 · AI-Code-Review-Evals/cubic-sentry

ShashankFC · 2026-02-13T05:50:02Z

Test 3

Summary by CodeRabbit

Release Notes

New Features
- Introduced error upsampling support for organization event statistics queries, improving accuracy of error metrics for eligible projects through sample-weighted counting.
- Upsampling applies selectively based on project allowlisting and query type.
Tests
- Added comprehensive test coverage for error upsampling across various project allowlist scenarios and event dataset types.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Replicated from ai-code-review-evaluation/sentry-coderabbit#3

…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.

- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.

ShashankFC · 2026-02-13T08:31:45Z

@cubic-dev-ai review this pull request

cubic-dev-ai · 2026-02-13T08:31:54Z

@cubic-dev-ai review this pull request

@ShashankFC I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

6 issues found across 8 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/sentry/api/endpoints/organization_events_stats.py">

<violation number="1" location="src/sentry/api/endpoints/organization_events_stats.py:221">
P1: Bug: Upsampling eligibility check uses outer-scope `dataset` instead of the `scoped_dataset` parameter. When `_get_event_stats` is called with a different dataset (e.g., during dashboard widget splitting where `discover` or `errors` is passed explicitly), the upsampling decision will be based on the wrong dataset. This can cause upsampling to be incorrectly skipped or incorrectly applied.</violation>
</file>

<file name="src/sentry/search/events/datasets/discover.py">

<violation number="1" location="src/sentry/search/events/datasets/discover.py:1051">
P2: `default_result_type` should be `"integer"` instead of `"number"`. This function produces an Int64 count via `toInt64(sum(...))`, and all other count-like functions in this file use `"integer"` as their result type. Using `"number"` may lead to incorrect formatting (e.g., displaying decimal places for what should be a whole-number count).</violation>

<violation number="2" location="src/sentry/search/events/datasets/discover.py:1051">
P1: Missing `private=True` — `upsampled_count` is an internal function only used by the upsampling transformation, but it's publicly exposed to API users. Without `private=True`, any user can call `upsampled_count()` in their queries. This will fail or produce incorrect results on non-allowlisted projects that lack the `sample_weight` column. Other internal functions like `column_hash`, `rounded_timestamp`, and `examples` are correctly marked `private=True`.</violation>
</file>

<file name="tests/snuba/api/endpoints/test_organization_events_stats.py">

<violation number="1" location="tests/snuba/api/endpoints/test_organization_events_stats.py:3626">
P3: Misleading comments: the assertions check for `count == 10` (the upsampled value from `1 / 0.1` sample rate), but the comments say "has 1 event". Update comments to reflect that these are upsampled counts.</violation>
</file>

<file name="src/sentry/testutils/factories.py">

<violation number="1" location="src/sentry/testutils/factories.py:353">
P2: Bug: Truthiness check `if client_sample_rate:` silently skips falsy-but-valid values like `0` and `0.0`. A `client_sample_rate` of `0` or `0.0` is a valid numeric value that would be dropped here. Use `if client_sample_rate is not None:` instead to only skip when the key is genuinely absent.</violation>
</file>

<file name="src/sentry/api/helpers/error_upsampling.py">

<violation number="1" location="src/sentry/api/helpers/error_upsampling.py:27">
P1: Bug: `hash()` is non-deterministic across Python processes (PYTHONHASHSEED is randomized by default in Python 3). Since the cache backend is shared across workers (Django cache → Redis/Memcached), each process will generate a different cache key for the same set of project IDs. This breaks cross-process cache sharing and makes `invalidate_upsampling_cache()` unable to invalidate entries created by other processes.

Use a deterministic representation instead, such as a comma-joined sorted string or a hashlib digest.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-02-13T08:37:55Z

+            # Early upsampling eligibility check for performance optimization
+            # This cached result ensures consistent behavior across query execution
+            should_upsample = is_errors_query_for_error_upsampled_projects(
+                snuba_params, organization, dataset, request


P1: Bug: Upsampling eligibility check uses outer-scope dataset instead of the scoped_dataset parameter. When _get_event_stats is called with a different dataset (e.g., during dashboard widget splitting where discover or errors is passed explicitly), the upsampling decision will be based on the wrong dataset. This can cause upsampling to be incorrectly skipped or incorrectly applied.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/api/endpoints/organization_events_stats.py, line 221: <comment>Bug: Upsampling eligibility check uses outer-scope `dataset` instead of the `scoped_dataset` parameter. When `_get_event_stats` is called with a different dataset (e.g., during dashboard widget splitting where `discover` or `errors` is passed explicitly), the upsampling decision will be based on the wrong dataset. This can cause upsampling to be incorrectly skipped or incorrectly applied.</comment> <file context> @@ -211,12 +215,28 @@ def _get_event_stats( + # Early upsampling eligibility check for performance optimization + # This cached result ensures consistent behavior across query execution + should_upsample = is_errors_query_for_error_upsampled_projects( + snuba_params, organization, dataset, request + ) + </file context>

Suggested change

snuba_params, organization, dataset, request

snuba_params, organization, scoped_dataset, request

cubic-dev-ai · 2026-02-13T08:37:56Z

+                        [Function("sum", [Column("sample_weight")])],
+                        alias,
+                    ),
+                    default_result_type="number",


P1: Missing private=True — upsampled_count is an internal function only used by the upsampling transformation, but it's publicly exposed to API users. Without private=True, any user can call upsampled_count() in their queries. This will fail or produce incorrect results on non-allowlisted projects that lack the sample_weight column. Other internal functions like column_hash, rounded_timestamp, and examples are correctly marked private=True.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/search/events/datasets/discover.py, line 1051: <comment>Missing `private=True` — `upsampled_count` is an internal function only used by the upsampling transformation, but it's publicly exposed to API users. Without `private=True`, any user can call `upsampled_count()` in their queries. This will fail or produce incorrect results on non-allowlisted projects that lack the `sample_weight` column. Other internal functions like `column_hash`, `rounded_timestamp`, and `examples` are correctly marked `private=True`.</comment> <file context> @@ -1038,6 +1038,18 @@ def function_converter(self) -> Mapping[str, SnQLFunction]: + [Function("sum", [Column("sample_weight")])], + alias, + ), + default_result_type="number", + ), ] </file context>

cubic-dev-ai · 2026-02-13T08:37:56Z

+    expensive repeated option lookups during high-traffic periods. This is safe
+    because allowlist changes are infrequent and eventual consistency is acceptable.
+    """
+    cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"


P1: Bug: hash() is non-deterministic across Python processes (PYTHONHASHSEED is randomized by default in Python 3). Since the cache backend is shared across workers (Django cache → Redis/Memcached), each process will generate a different cache key for the same set of project IDs. This breaks cross-process cache sharing and makes invalidate_upsampling_cache() unable to invalidate entries created by other processes.

Use a deterministic representation instead, such as a comma-joined sorted string or a hashlib digest.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/api/helpers/error_upsampling.py, line 27: <comment>Bug: `hash()` is non-deterministic across Python processes (PYTHONHASHSEED is randomized by default in Python 3). Since the cache backend is shared across workers (Django cache → Redis/Memcached), each process will generate a different cache key for the same set of project IDs. This breaks cross-process cache sharing and makes `invalidate_upsampling_cache()` unable to invalidate entries created by other processes. Use a deterministic representation instead, such as a comma-joined sorted string or a hashlib digest.</comment> <file context> @@ -0,0 +1,140 @@ + expensive repeated option lookups during high-traffic periods. This is safe + because allowlist changes are infrequent and eventual consistency is acceptable. + """ + cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}" + + # Check cache first for performance optimization </file context>

cubic-dev-ai · 2026-02-13T08:37:56Z

+                        [Function("sum", [Column("sample_weight")])],
+                        alias,
+                    ),
+                    default_result_type="number",


P2: default_result_type should be "integer" instead of "number". This function produces an Int64 count via toInt64(sum(...)), and all other count-like functions in this file use "integer" as their result type. Using "number" may lead to incorrect formatting (e.g., displaying decimal places for what should be a whole-number count).

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/search/events/datasets/discover.py, line 1051: <comment>`default_result_type` should be `"integer"` instead of `"number"`. This function produces an Int64 count via `toInt64(sum(...))`, and all other count-like functions in this file use `"integer"` as their result type. Using `"number"` may lead to incorrect formatting (e.g., displaying decimal places for what should be a whole-number count).</comment> <file context> @@ -1038,6 +1038,18 @@ def function_converter(self) -> Mapping[str, SnQLFunction]: + [Function("sum", [Column("sample_weight")])], + alias, + ), + default_result_type="number", + ), ] </file context>

Suggested change

default_result_type="number",

default_result_type="integer",

cubic-dev-ai · 2026-02-13T08:37:56Z

+        )
+    except Exception:
+        pass
+    if client_sample_rate:


P2: Bug: Truthiness check if client_sample_rate: silently skips falsy-but-valid values like 0 and 0.0. A client_sample_rate of 0 or 0.0 is a valid numeric value that would be dropped here. Use if client_sample_rate is not None: instead to only skip when the key is genuinely absent.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/testutils/factories.py, line 353: <comment>Bug: Truthiness check `if client_sample_rate:` silently skips falsy-but-valid values like `0` and `0.0`. A `client_sample_rate` of `0` or `0.0` is a valid numeric value that would be dropped here. Use `if client_sample_rate is not None:` instead to only skip when the key is genuinely absent.</comment> <file context> @@ -341,6 +341,22 @@ def _patch_artifact_manifest(path, org=None, release=None, project=None, extra_f + ) + except Exception: + pass + if client_sample_rate: + try: + normalized_data["sample_rate"] = float(client_sample_rate) </file context>

Suggested change

if client_sample_rate:

if client_sample_rate is not None:

cubic-dev-ai · 2026-02-13T08:37:56Z

+        assert data[0][1][0]["count"] == 10  # First bucket has 1 event
+        assert data[1][1][0]["count"] == 10  # Second bucket has 1 event


P3: Misleading comments: the assertions check for count == 10 (the upsampled value from 1 / 0.1 sample rate), but the comments say "has 1 event". Update comments to reflect that these are upsampled counts.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At tests/snuba/api/endpoints/test_organization_events_stats.py, line 3626: <comment>Misleading comments: the assertions check for `count == 10` (the upsampled value from `1 / 0.1` sample rate), but the comments say "has 1 event". Update comments to reflect that these are upsampled counts.</comment> <file context> @@ -3549,3 +3549,174 @@ def test_top_events_with_error_unhandled(self): + assert response.status_code == 200, response.content + data = response.data["data"] + assert len(data) == 2 # Two time buckets + assert data[0][1][0]["count"] == 10 # First bucket has 1 event + assert data[1][1][0]["count"] == 10 # Second bucket has 1 event + </file context>

Suggested change

assert data[0][1][0]["count"] == 10 # First bucket has 1 event

assert data[1][1][0]["count"] == 10 # Second bucket has 1 event

assert data[0][1][0]["count"] == 10 # First bucket: 1 event upsampled by 1/0.1 = 10

assert data[1][1][0]["count"] == 10 # Second bucket: 1 event upsampled by 1/0.1 = 10

yuvmen and others added 2 commits July 25, 2025 09:48

cubic-dev-ai Bot reviewed Feb 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(upsampling) - Support upsampled error count with performance optimizations#8

feat(upsampling) - Support upsampled error count with performance optimizations#8
ShashankFC wants to merge 2 commits into
masterfrom
error-upsampling-race-condition

ShashankFC commented Feb 13, 2026

Uh oh!

ShashankFC commented Feb 13, 2026

Uh oh!

cubic-dev-ai Bot commented Feb 13, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	snuba_params, organization, dataset, request
	snuba_params, organization, scoped_dataset, request

		assert data[0][1][0]["count"] == 10 # First bucket has 1 event
		assert data[1][1][0]["count"] == 10 # Second bucket has 1 event

Conversation

ShashankFC commented Feb 13, 2026

Summary by CodeRabbit

Release Notes

Uh oh!

ShashankFC commented Feb 13, 2026

Uh oh!

cubic-dev-ai Bot commented Feb 13, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading

cubic-dev-ai Bot Feb 13, 2026 •

edited

Loading