feat(upsampling) - Support upsampled error count with performance optimizations#8
feat(upsampling) - Support upsampled error count with performance optimizations#8ShashankFC wants to merge 2 commits into
Conversation
…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.
|
@cubic-dev-ai review this pull request |
@ShashankFC I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
6 issues found across 8 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/sentry/api/endpoints/organization_events_stats.py">
<violation number="1" location="src/sentry/api/endpoints/organization_events_stats.py:221">
P1: Bug: Upsampling eligibility check uses outer-scope `dataset` instead of the `scoped_dataset` parameter. When `_get_event_stats` is called with a different dataset (e.g., during dashboard widget splitting where `discover` or `errors` is passed explicitly), the upsampling decision will be based on the wrong dataset. This can cause upsampling to be incorrectly skipped or incorrectly applied.</violation>
</file>
<file name="src/sentry/search/events/datasets/discover.py">
<violation number="1" location="src/sentry/search/events/datasets/discover.py:1051">
P2: `default_result_type` should be `"integer"` instead of `"number"`. This function produces an Int64 count via `toInt64(sum(...))`, and all other count-like functions in this file use `"integer"` as their result type. Using `"number"` may lead to incorrect formatting (e.g., displaying decimal places for what should be a whole-number count).</violation>
<violation number="2" location="src/sentry/search/events/datasets/discover.py:1051">
P1: Missing `private=True` — `upsampled_count` is an internal function only used by the upsampling transformation, but it's publicly exposed to API users. Without `private=True`, any user can call `upsampled_count()` in their queries. This will fail or produce incorrect results on non-allowlisted projects that lack the `sample_weight` column. Other internal functions like `column_hash`, `rounded_timestamp`, and `examples` are correctly marked `private=True`.</violation>
</file>
<file name="tests/snuba/api/endpoints/test_organization_events_stats.py">
<violation number="1" location="tests/snuba/api/endpoints/test_organization_events_stats.py:3626">
P3: Misleading comments: the assertions check for `count == 10` (the upsampled value from `1 / 0.1` sample rate), but the comments say "has 1 event". Update comments to reflect that these are upsampled counts.</violation>
</file>
<file name="src/sentry/testutils/factories.py">
<violation number="1" location="src/sentry/testutils/factories.py:353">
P2: Bug: Truthiness check `if client_sample_rate:` silently skips falsy-but-valid values like `0` and `0.0`. A `client_sample_rate` of `0` or `0.0` is a valid numeric value that would be dropped here. Use `if client_sample_rate is not None:` instead to only skip when the key is genuinely absent.</violation>
</file>
<file name="src/sentry/api/helpers/error_upsampling.py">
<violation number="1" location="src/sentry/api/helpers/error_upsampling.py:27">
P1: Bug: `hash()` is non-deterministic across Python processes (PYTHONHASHSEED is randomized by default in Python 3). Since the cache backend is shared across workers (Django cache → Redis/Memcached), each process will generate a different cache key for the same set of project IDs. This breaks cross-process cache sharing and makes `invalidate_upsampling_cache()` unable to invalidate entries created by other processes.
Use a deterministic representation instead, such as a comma-joined sorted string or a hashlib digest.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| # Early upsampling eligibility check for performance optimization | ||
| # This cached result ensures consistent behavior across query execution | ||
| should_upsample = is_errors_query_for_error_upsampled_projects( | ||
| snuba_params, organization, dataset, request |
There was a problem hiding this comment.
P1: Bug: Upsampling eligibility check uses outer-scope dataset instead of the scoped_dataset parameter. When _get_event_stats is called with a different dataset (e.g., during dashboard widget splitting where discover or errors is passed explicitly), the upsampling decision will be based on the wrong dataset. This can cause upsampling to be incorrectly skipped or incorrectly applied.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/api/endpoints/organization_events_stats.py, line 221:
<comment>Bug: Upsampling eligibility check uses outer-scope `dataset` instead of the `scoped_dataset` parameter. When `_get_event_stats` is called with a different dataset (e.g., during dashboard widget splitting where `discover` or `errors` is passed explicitly), the upsampling decision will be based on the wrong dataset. This can cause upsampling to be incorrectly skipped or incorrectly applied.</comment>
<file context>
@@ -211,12 +215,28 @@ def _get_event_stats(
+ # Early upsampling eligibility check for performance optimization
+ # This cached result ensures consistent behavior across query execution
+ should_upsample = is_errors_query_for_error_upsampled_projects(
+ snuba_params, organization, dataset, request
+ )
+
</file context>
| snuba_params, organization, dataset, request | |
| snuba_params, organization, scoped_dataset, request |
| [Function("sum", [Column("sample_weight")])], | ||
| alias, | ||
| ), | ||
| default_result_type="number", |
There was a problem hiding this comment.
P1: Missing private=True — upsampled_count is an internal function only used by the upsampling transformation, but it's publicly exposed to API users. Without private=True, any user can call upsampled_count() in their queries. This will fail or produce incorrect results on non-allowlisted projects that lack the sample_weight column. Other internal functions like column_hash, rounded_timestamp, and examples are correctly marked private=True.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/search/events/datasets/discover.py, line 1051:
<comment>Missing `private=True` — `upsampled_count` is an internal function only used by the upsampling transformation, but it's publicly exposed to API users. Without `private=True`, any user can call `upsampled_count()` in their queries. This will fail or produce incorrect results on non-allowlisted projects that lack the `sample_weight` column. Other internal functions like `column_hash`, `rounded_timestamp`, and `examples` are correctly marked `private=True`.</comment>
<file context>
@@ -1038,6 +1038,18 @@ def function_converter(self) -> Mapping[str, SnQLFunction]:
+ [Function("sum", [Column("sample_weight")])],
+ alias,
+ ),
+ default_result_type="number",
+ ),
]
</file context>
| expensive repeated option lookups during high-traffic periods. This is safe | ||
| because allowlist changes are infrequent and eventual consistency is acceptable. | ||
| """ | ||
| cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}" |
There was a problem hiding this comment.
P1: Bug: hash() is non-deterministic across Python processes (PYTHONHASHSEED is randomized by default in Python 3). Since the cache backend is shared across workers (Django cache → Redis/Memcached), each process will generate a different cache key for the same set of project IDs. This breaks cross-process cache sharing and makes invalidate_upsampling_cache() unable to invalidate entries created by other processes.
Use a deterministic representation instead, such as a comma-joined sorted string or a hashlib digest.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/api/helpers/error_upsampling.py, line 27:
<comment>Bug: `hash()` is non-deterministic across Python processes (PYTHONHASHSEED is randomized by default in Python 3). Since the cache backend is shared across workers (Django cache → Redis/Memcached), each process will generate a different cache key for the same set of project IDs. This breaks cross-process cache sharing and makes `invalidate_upsampling_cache()` unable to invalidate entries created by other processes.
Use a deterministic representation instead, such as a comma-joined sorted string or a hashlib digest.</comment>
<file context>
@@ -0,0 +1,140 @@
+ expensive repeated option lookups during high-traffic periods. This is safe
+ because allowlist changes are infrequent and eventual consistency is acceptable.
+ """
+ cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"
+
+ # Check cache first for performance optimization
</file context>
| [Function("sum", [Column("sample_weight")])], | ||
| alias, | ||
| ), | ||
| default_result_type="number", |
There was a problem hiding this comment.
P2: default_result_type should be "integer" instead of "number". This function produces an Int64 count via toInt64(sum(...)), and all other count-like functions in this file use "integer" as their result type. Using "number" may lead to incorrect formatting (e.g., displaying decimal places for what should be a whole-number count).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/search/events/datasets/discover.py, line 1051:
<comment>`default_result_type` should be `"integer"` instead of `"number"`. This function produces an Int64 count via `toInt64(sum(...))`, and all other count-like functions in this file use `"integer"` as their result type. Using `"number"` may lead to incorrect formatting (e.g., displaying decimal places for what should be a whole-number count).</comment>
<file context>
@@ -1038,6 +1038,18 @@ def function_converter(self) -> Mapping[str, SnQLFunction]:
+ [Function("sum", [Column("sample_weight")])],
+ alias,
+ ),
+ default_result_type="number",
+ ),
]
</file context>
| default_result_type="number", | |
| default_result_type="integer", |
| ) | ||
| except Exception: | ||
| pass | ||
| if client_sample_rate: |
There was a problem hiding this comment.
P2: Bug: Truthiness check if client_sample_rate: silently skips falsy-but-valid values like 0 and 0.0. A client_sample_rate of 0 or 0.0 is a valid numeric value that would be dropped here. Use if client_sample_rate is not None: instead to only skip when the key is genuinely absent.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/sentry/testutils/factories.py, line 353:
<comment>Bug: Truthiness check `if client_sample_rate:` silently skips falsy-but-valid values like `0` and `0.0`. A `client_sample_rate` of `0` or `0.0` is a valid numeric value that would be dropped here. Use `if client_sample_rate is not None:` instead to only skip when the key is genuinely absent.</comment>
<file context>
@@ -341,6 +341,22 @@ def _patch_artifact_manifest(path, org=None, release=None, project=None, extra_f
+ )
+ except Exception:
+ pass
+ if client_sample_rate:
+ try:
+ normalized_data["sample_rate"] = float(client_sample_rate)
</file context>
| if client_sample_rate: | |
| if client_sample_rate is not None: |
| assert data[0][1][0]["count"] == 10 # First bucket has 1 event | ||
| assert data[1][1][0]["count"] == 10 # Second bucket has 1 event |
There was a problem hiding this comment.
P3: Misleading comments: the assertions check for count == 10 (the upsampled value from 1 / 0.1 sample rate), but the comments say "has 1 event". Update comments to reflect that these are upsampled counts.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/snuba/api/endpoints/test_organization_events_stats.py, line 3626:
<comment>Misleading comments: the assertions check for `count == 10` (the upsampled value from `1 / 0.1` sample rate), but the comments say "has 1 event". Update comments to reflect that these are upsampled counts.</comment>
<file context>
@@ -3549,3 +3549,174 @@ def test_top_events_with_error_unhandled(self):
+ assert response.status_code == 200, response.content
+ data = response.data["data"]
+ assert len(data) == 2 # Two time buckets
+ assert data[0][1][0]["count"] == 10 # First bucket has 1 event
+ assert data[1][1][0]["count"] == 10 # Second bucket has 1 event
+
</file context>
| assert data[0][1][0]["count"] == 10 # First bucket has 1 event | |
| assert data[1][1][0]["count"] == 10 # Second bucket has 1 event | |
| assert data[0][1][0]["count"] == 10 # First bucket: 1 event upsampled by 1/0.1 = 10 | |
| assert data[1][1][0]["count"] == 10 # Second bucket: 1 event upsampled by 1/0.1 = 10 |
Test 3
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.
Replicated from ai-code-review-evaluation/sentry-coderabbit#3