Skip to content

feat(upsampling) - Support upsampled error count with performance optimizations#8

Open
ShashankFC wants to merge 2 commits into
masterfrom
error-upsampling-race-condition
Open

feat(upsampling) - Support upsampled error count with performance optimizations#8
ShashankFC wants to merge 2 commits into
masterfrom
error-upsampling-race-condition

Conversation

@ShashankFC

Copy link
Copy Markdown

Test 3

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced error upsampling support for organization event statistics queries, improving accuracy of error metrics for eligible projects through sample-weighted counting.
    • Upsampling applies selectively based on project allowlisting and query type.
  • Tests

    • Added comprehensive test coverage for error upsampling across various project allowlist scenarios and event dataset types.

✏️ Tip: You can customize this high-level summary in your review settings.


Replicated from ai-code-review-evaluation/sentry-coderabbit#3

yuvmen and others added 2 commits July 25, 2025 09:48
…(#94376)

Part of the Error Upsampling project:
https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link

Events-stats API will now check if all projects in the query are
allowlisted for upsampling, and convert the count query to a sum over
`sample_weight` in Snuba, this is done by defining a new SnQL function
`upsampled_count()`.

I noticed there are also eps() and epm() functions in use in this
endpoint. I considered (and even worked on) also supporting
swapping eps() and epm() which for correctness should probably also not
count naively and use `sample_weight`, but this
caused some complications and since they are only in use by specific
dashboard widgets and not available in discover
I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance
- Separate upsampling eligibility check from query transformation for better optimization
- Remove unnecessary null checks in upsampled_count() function per schema requirements
- Add cache invalidation utilities for configuration management

This improves performance during high-traffic periods by avoiding repeated
expensive allowlist lookups while maintaining data consistency.
@ShashankFC

Copy link
Copy Markdown
Author

@cubic-dev-ai review this pull request

@cubic-dev-ai

cubic-dev-ai Bot commented Feb 13, 2026

Copy link
Copy Markdown

@cubic-dev-ai review this pull request

@ShashankFC I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 8 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/sentry/api/endpoints/organization_events_stats.py">

<violation number="1" location="src/sentry/api/endpoints/organization_events_stats.py:221">
P1: Bug: Upsampling eligibility check uses outer-scope `dataset` instead of the `scoped_dataset` parameter. When `_get_event_stats` is called with a different dataset (e.g., during dashboard widget splitting where `discover` or `errors` is passed explicitly), the upsampling decision will be based on the wrong dataset. This can cause upsampling to be incorrectly skipped or incorrectly applied.</violation>
</file>

<file name="src/sentry/search/events/datasets/discover.py">

<violation number="1" location="src/sentry/search/events/datasets/discover.py:1051">
P2: `default_result_type` should be `"integer"` instead of `"number"`. This function produces an Int64 count via `toInt64(sum(...))`, and all other count-like functions in this file use `"integer"` as their result type. Using `"number"` may lead to incorrect formatting (e.g., displaying decimal places for what should be a whole-number count).</violation>

<violation number="2" location="src/sentry/search/events/datasets/discover.py:1051">
P1: Missing `private=True` — `upsampled_count` is an internal function only used by the upsampling transformation, but it's publicly exposed to API users. Without `private=True`, any user can call `upsampled_count()` in their queries. This will fail or produce incorrect results on non-allowlisted projects that lack the `sample_weight` column. Other internal functions like `column_hash`, `rounded_timestamp`, and `examples` are correctly marked `private=True`.</violation>
</file>

<file name="tests/snuba/api/endpoints/test_organization_events_stats.py">

<violation number="1" location="tests/snuba/api/endpoints/test_organization_events_stats.py:3626">
P3: Misleading comments: the assertions check for `count == 10` (the upsampled value from `1 / 0.1` sample rate), but the comments say "has 1 event". Update comments to reflect that these are upsampled counts.</violation>
</file>

<file name="src/sentry/testutils/factories.py">

<violation number="1" location="src/sentry/testutils/factories.py:353">
P2: Bug: Truthiness check `if client_sample_rate:` silently skips falsy-but-valid values like `0` and `0.0`. A `client_sample_rate` of `0` or `0.0` is a valid numeric value that would be dropped here. Use `if client_sample_rate is not None:` instead to only skip when the key is genuinely absent.</violation>
</file>

<file name="src/sentry/api/helpers/error_upsampling.py">

<violation number="1" location="src/sentry/api/helpers/error_upsampling.py:27">
P1: Bug: `hash()` is non-deterministic across Python processes (PYTHONHASHSEED is randomized by default in Python 3). Since the cache backend is shared across workers (Django cache → Redis/Memcached), each process will generate a different cache key for the same set of project IDs. This breaks cross-process cache sharing and makes `invalidate_upsampling_cache()` unable to invalidate entries created by other processes.

Use a deterministic representation instead, such as a comma-joined sorted string or a hashlib digest.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

# Early upsampling eligibility check for performance optimization
# This cached result ensures consistent behavior across query execution
should_upsample = is_errors_query_for_error_upsampled_projects(
snuba_params, organization, dataset, request

@cubic-dev-ai cubic-dev-ai Bot Feb 13, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Bug: Upsampling eligibility check uses outer-scope dataset instead of the scoped_dataset parameter. When _get_event_stats is called with a different dataset (e.g., during dashboard widget splitting where discover or errors is passed explicitly), the upsampling decision will be based on the wrong dataset. This can cause upsampling to be incorrectly skipped or incorrectly applied.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/api/endpoints/organization_events_stats.py, line 221:

<comment>Bug: Upsampling eligibility check uses outer-scope `dataset` instead of the `scoped_dataset` parameter. When `_get_event_stats` is called with a different dataset (e.g., during dashboard widget splitting where `discover` or `errors` is passed explicitly), the upsampling decision will be based on the wrong dataset. This can cause upsampling to be incorrectly skipped or incorrectly applied.</comment>

<file context>
@@ -211,12 +215,28 @@ def _get_event_stats(
+            # Early upsampling eligibility check for performance optimization
+            # This cached result ensures consistent behavior across query execution
+            should_upsample = is_errors_query_for_error_upsampled_projects(
+                snuba_params, organization, dataset, request
+            )
+            
</file context>
Suggested change
snuba_params, organization, dataset, request
snuba_params, organization, scoped_dataset, request
Fix with Cubic

[Function("sum", [Column("sample_weight")])],
alias,
),
default_result_type="number",

@cubic-dev-ai cubic-dev-ai Bot Feb 13, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Missing private=Trueupsampled_count is an internal function only used by the upsampling transformation, but it's publicly exposed to API users. Without private=True, any user can call upsampled_count() in their queries. This will fail or produce incorrect results on non-allowlisted projects that lack the sample_weight column. Other internal functions like column_hash, rounded_timestamp, and examples are correctly marked private=True.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/search/events/datasets/discover.py, line 1051:

<comment>Missing `private=True` — `upsampled_count` is an internal function only used by the upsampling transformation, but it's publicly exposed to API users. Without `private=True`, any user can call `upsampled_count()` in their queries. This will fail or produce incorrect results on non-allowlisted projects that lack the `sample_weight` column. Other internal functions like `column_hash`, `rounded_timestamp`, and `examples` are correctly marked `private=True`.</comment>

<file context>
@@ -1038,6 +1038,18 @@ def function_converter(self) -> Mapping[str, SnQLFunction]:
+                        [Function("sum", [Column("sample_weight")])],
+                        alias,
+                    ),
+                    default_result_type="number",
+                ),
             ]
</file context>
Fix with Cubic

expensive repeated option lookups during high-traffic periods. This is safe
because allowlist changes are infrequent and eventual consistency is acceptable.
"""
cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"

@cubic-dev-ai cubic-dev-ai Bot Feb 13, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Bug: hash() is non-deterministic across Python processes (PYTHONHASHSEED is randomized by default in Python 3). Since the cache backend is shared across workers (Django cache → Redis/Memcached), each process will generate a different cache key for the same set of project IDs. This breaks cross-process cache sharing and makes invalidate_upsampling_cache() unable to invalidate entries created by other processes.

Use a deterministic representation instead, such as a comma-joined sorted string or a hashlib digest.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/api/helpers/error_upsampling.py, line 27:

<comment>Bug: `hash()` is non-deterministic across Python processes (PYTHONHASHSEED is randomized by default in Python 3). Since the cache backend is shared across workers (Django cache → Redis/Memcached), each process will generate a different cache key for the same set of project IDs. This breaks cross-process cache sharing and makes `invalidate_upsampling_cache()` unable to invalidate entries created by other processes.

Use a deterministic representation instead, such as a comma-joined sorted string or a hashlib digest.</comment>

<file context>
@@ -0,0 +1,140 @@
+    expensive repeated option lookups during high-traffic periods. This is safe
+    because allowlist changes are infrequent and eventual consistency is acceptable.
+    """
+    cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"
+    
+    # Check cache first for performance optimization
</file context>
Fix with Cubic

[Function("sum", [Column("sample_weight")])],
alias,
),
default_result_type="number",

@cubic-dev-ai cubic-dev-ai Bot Feb 13, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: default_result_type should be "integer" instead of "number". This function produces an Int64 count via toInt64(sum(...)), and all other count-like functions in this file use "integer" as their result type. Using "number" may lead to incorrect formatting (e.g., displaying decimal places for what should be a whole-number count).

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/search/events/datasets/discover.py, line 1051:

<comment>`default_result_type` should be `"integer"` instead of `"number"`. This function produces an Int64 count via `toInt64(sum(...))`, and all other count-like functions in this file use `"integer"` as their result type. Using `"number"` may lead to incorrect formatting (e.g., displaying decimal places for what should be a whole-number count).</comment>

<file context>
@@ -1038,6 +1038,18 @@ def function_converter(self) -> Mapping[str, SnQLFunction]:
+                        [Function("sum", [Column("sample_weight")])],
+                        alias,
+                    ),
+                    default_result_type="number",
+                ),
             ]
</file context>
Suggested change
default_result_type="number",
default_result_type="integer",
Fix with Cubic

)
except Exception:
pass
if client_sample_rate:

@cubic-dev-ai cubic-dev-ai Bot Feb 13, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Bug: Truthiness check if client_sample_rate: silently skips falsy-but-valid values like 0 and 0.0. A client_sample_rate of 0 or 0.0 is a valid numeric value that would be dropped here. Use if client_sample_rate is not None: instead to only skip when the key is genuinely absent.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/testutils/factories.py, line 353:

<comment>Bug: Truthiness check `if client_sample_rate:` silently skips falsy-but-valid values like `0` and `0.0`. A `client_sample_rate` of `0` or `0.0` is a valid numeric value that would be dropped here. Use `if client_sample_rate is not None:` instead to only skip when the key is genuinely absent.</comment>

<file context>
@@ -341,6 +341,22 @@ def _patch_artifact_manifest(path, org=None, release=None, project=None, extra_f
+        )
+    except Exception:
+        pass
+    if client_sample_rate:
+        try:
+            normalized_data["sample_rate"] = float(client_sample_rate)
</file context>
Suggested change
if client_sample_rate:
if client_sample_rate is not None:
Fix with Cubic

Comment on lines +3626 to +3627
assert data[0][1][0]["count"] == 10 # First bucket has 1 event
assert data[1][1][0]["count"] == 10 # Second bucket has 1 event

@cubic-dev-ai cubic-dev-ai Bot Feb 13, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Misleading comments: the assertions check for count == 10 (the upsampled value from 1 / 0.1 sample rate), but the comments say "has 1 event". Update comments to reflect that these are upsampled counts.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/snuba/api/endpoints/test_organization_events_stats.py, line 3626:

<comment>Misleading comments: the assertions check for `count == 10` (the upsampled value from `1 / 0.1` sample rate), but the comments say "has 1 event". Update comments to reflect that these are upsampled counts.</comment>

<file context>
@@ -3549,3 +3549,174 @@ def test_top_events_with_error_unhandled(self):
+        assert response.status_code == 200, response.content
+        data = response.data["data"]
+        assert len(data) == 2  # Two time buckets
+        assert data[0][1][0]["count"] == 10  # First bucket has 1 event
+        assert data[1][1][0]["count"] == 10  # Second bucket has 1 event
+
</file context>
Suggested change
assert data[0][1][0]["count"] == 10 # First bucket has 1 event
assert data[1][1][0]["count"] == 10 # Second bucket has 1 event
assert data[0][1][0]["count"] == 10 # First bucket: 1 event upsampled by 1/0.1 = 10
assert data[1][1][0]["count"] == 10 # Second bucket: 1 event upsampled by 1/0.1 = 10
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants