Skip to content

feat(upsampling) - Support upsampled error count with performance optimizations#13

Open
akshayutture-augment wants to merge 2 commits into
masterfrom
error-upsampling-race-condition
Open

feat(upsampling) - Support upsampled error count with performance optimizations#13
akshayutture-augment wants to merge 2 commits into
masterfrom
error-upsampling-race-condition

Conversation

@akshayutture-augment

@akshayutture-augment akshayutture-augment commented Nov 14, 2025

Copy link
Copy Markdown

Test 3

Summary by CodeRabbit

  • New Features

    • Added error upsampling functionality for improved query performance on error events. The system now intelligently applies sample-weight transformations based on per-organization project allowlists to scale aggregated error metrics appropriately.
    • Introduced new upsampled_count aggregation function for discovering upsampled error metrics.
  • Tests

    • Added comprehensive test coverage for error upsampling behavior including allowlist validation and dataset-specific handling.

✏️ Tip: You can customize this high-level summary in your review settings.


Replicated from ai-code-review-evaluation/sentry-coderabbit#3

yuvmen and others added 2 commits July 25, 2025 09:48
…(#94376)

Part of the Error Upsampling project:
https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link

Events-stats API will now check if all projects in the query are
allowlisted for upsampling, and convert the count query to a sum over
`sample_weight` in Snuba, this is done by defining a new SnQL function
`upsampled_count()`.

I noticed there are also eps() and epm() functions in use in this
endpoint. I considered (and even worked on) also supporting
swapping eps() and epm() which for correctness should probably also not
count naively and use `sample_weight`, but this
caused some complications and since they are only in use by specific
dashboard widgets and not available in discover
I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance
- Separate upsampling eligibility check from query transformation for better optimization
- Remove unnecessary null checks in upsampled_count() function per schema requirements
- Add cache invalidation utilities for configuration management

This improves performance during high-traffic periods by avoiding repeated
expensive allowlist lookups while maintaining data consistency.
@akshayutture-augment

Copy link
Copy Markdown
Author

@coderabbitai review

@akshayutture-augment

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Nov 18, 2025

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Nov 18, 2025

Copy link
Copy Markdown

Walkthrough

Introduces error upsampling functionality to Sentry's event statistics system. Adds helper utilities to determine query eligibility based on organization allowlists and dataset type, applies transformations to rewrite count aggregations to upsampled_count, integrates these checks into the organization events stats endpoint, introduces a new upsampled_count function to the discover dataset, and includes comprehensive test coverage for the new functionality.

Changes

Cohort / File(s) Summary
Configuration & Build
pyproject.toml, sentry-repo
Updated mypy overrides and test references for error_upsampling modules; minor git submodule version update.
Error Upsampling Core
src/sentry/api/helpers/error_upsampling.py
New module providing error upsampling helper functions: eligibility checking via organization allowlists with 60-second caching, query column transformation (count → upsampled_count), sample weight decision logic for datasets (errors/transactions/discover), and cache invalidation utilities.
Query Transformation & Integration
src/sentry/api/endpoints/organization_events_stats.py, src/sentry/search/events/datasets/discover.py
Applies upsampling eligibility checks and column transformations in organization events stats endpoint across RPC and non-RPC query paths; adds upsampled_count aggregation function to discover dataset config.
Test Infrastructure
src/sentry/testutils/factories.py
Adds helper to propagate error_sampling.client_sample_rate to normalized event data in store_event flow.
Test Coverage
tests/sentry/api/helpers/test_error_upsampling.py, tests/snuba/api/endpoints/test_organization_events_stats.py
Comprehensive test suite for error upsampling eligibility, column transformation logic, query type detection, sample weight application rules, and end-to-end upsampling behavior across allowlist scenarios.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Endpoint as organization_events_stats<br/>endpoint
    participant Helper as error_upsampling<br/>helper
    participant Cache as Eligibility<br/>Cache
    participant Snuba
    
    Client->>Endpoint: GET /organization/events/stats
    Endpoint->>Helper: is_errors_query_for_error_upsampled_projects()
    Helper->>Cache: Check eligibility cache
    alt Cache Miss
        Cache-->>Helper: cache miss
        Helper->>Helper: _are_all_projects_error_upsampled()
        Helper->>Helper: _should_apply_sample_weight_transform()
        Helper->>Cache: Store result (60s TTL)
    else Cache Hit
        Cache-->>Helper: cached eligibility
    end
    Helper-->>Endpoint: upsampling_enabled: bool
    
    alt Upsampling Enabled
        Endpoint->>Helper: transform_query_columns_for_error_upsampling()
        Helper-->>Endpoint: count() → upsampled_count() as count
        Endpoint->>Snuba: Query with transformed columns
    else Upsampling Disabled
        Endpoint->>Snuba: Query with original columns
    end
    
    Snuba-->>Endpoint: Results
    Endpoint-->>Client: Stats response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • error_upsampling.py: Verify caching logic correctness, cache invalidation coverage, and decision-making across dataset types (errors/transactions/discover)
  • organization_events_stats.py: Confirm upsampling transformations applied at all query path junctures (RPC and non-RPC, before each query execution)
  • discover.py: Validate upsampled_count aggregation function definition and behavior (sample_weight summation via toInt64)
  • Test comprehensiveness: Ensure all allowlist scenarios, query types, and edge cases are covered across both unit and integration tests

Poem

A rabbit hops through Snuba's fields so bright, 🐰
Where upsampled counts now dance in light,
Sample weights ascend with cache so keen,
Error events glow in queries clean,
Allowlists guide the path just right!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.78% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main change: implementing support for upsampled error counts with performance optimizations, which aligns with the core functionality added across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch error-upsampling-race-condition

Comment @coderabbitai help to get the list of available commands and usage tips.

@akshayutture-augment

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Nov 22, 2025

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
sentry-repo (1)

49-64: Span link sanitization and dropped-attribute accounting look solid

The new links handling, _sanitize_span_link, and corresponding test correctly preserve known attributes and accumulate sentry.dropped_attributes_count (including pre-existing drops). The JSON-encoding behavior is validated by the test and matches the intended storage model.

If you want to harden this further, you could defensively coerce sentry.dropped_attributes_count to int inside _sanitize_span_link to avoid unexpected types from Relay input, but that's strictly optional.

Also applies to: 74-100, 111-133

src/sentry/api/helpers/error_upsampling.py (1)

13-41: Error upsampling helpers are well-structured; a couple of small nits

The overall flow (allowlist check + short cache, dataset gating, and count()upsampled_count() rewrite) looks coherent and matches the intended behavior.

Two minor points to consider:

  1. Unused organization parameter in _are_all_projects_error_upsampled
    organization is currently unused. If the allowlist is meant to be organization-scoped, you probably want:

    allowlist = options.get(
        "issues.client_error_sampling.project_allowlist",
        [],
        organization=organization,
    )

    Otherwise, consider dropping the argument to avoid confusion.

  2. _is_error_focused_query uses a raw substring match
    Checking only "event.type:error" in query is simple but will also treat queries like !event.type:error or mixed-type queries as “error focused”. If you need stricter semantics later (e.g., “only errors”), you might want to tighten this to parse individual terms rather than doing a plain substring search.

Neither issue is blocking, but they’re worth keeping in mind as the feature evolves.

Also applies to: 43-65, 67-100, 103-140

src/sentry/search/events/datasets/discover.py (1)

1041-1052: upsampled_count implementation looks correct; tweak lambda arg to avoid lint noise

The aggregate toInt64(sum(sample_weight)) matches the intended upsampled-count semantics and aligns with the transform in transform_query_columns_for_error_upsampling.

You can silence the Ruff ARG005 warning and make intent clearer by ignoring the unused args parameter:

-                SnQLFunction(
-                    "upsampled_count",
-                    required_args=[],
-                    # Optimized aggregation for error upsampling - assumes sample_weight
-                    # exists for all events in allowlisted projects as per schema design
-                    snql_aggregate=lambda args, alias: Function(
-                        "toInt64",
-                        [Function("sum", [Column("sample_weight")])],
-                        alias,
-                    ),
-                    default_result_type="number",
-                ),
+                SnQLFunction(
+                    "upsampled_count",
+                    required_args=[],
+                    # Optimized aggregation for error upsampling - assumes sample_weight
+                    # exists for all events in allowlisted projects as per schema design
+                    snql_aggregate=lambda _, alias: Function(
+                        "toInt64",
+                        [Function("sum", [Column("sample_weight")])],
+                        alias,
+                    ),
+                    default_result_type="number",
+                ),
tests/sentry/api/helpers/test_error_upsampling.py (1)

19-101: Helper tests are solid; consider a small cache/invalidation round-trip test

The tests nicely cover allowlist evaluation, column transformation (including case/whitespace), and dataset/query gating.

If you want to future-proof the caching logic, an extra unit test exercising is_errors_query_for_error_upsampled_projects together with invalidate_upsampling_cache (e.g., toggling the allowlist and asserting the cached result updates after invalidation) would help catch regressions in the memoization layer without relying solely on higher-level endpoint tests.

tests/snuba/api/endpoints/test_organization_events_stats.py (2)

3554-3722: Consider expanding test coverage for edge cases.

The current test suite covers basic scenarios well, but consider adding tests for:

  • Events without error_sampling context (fallback behavior)
  • Different sample rates (0.01, 0.5, 1.0)
  • Edge case: client_sample_rate = 0 or = 1
  • Multiple yAxis parameters with upsampling (e.g., ["count()", "count_unique(user)"])
  • Integration with topEvents parameter
  • Comparison with comparisonDelta parameter

These additional test cases would ensure the error upsampling feature handles various scenarios correctly.


3617-3617: Clarify dataset parameter usage for consistency.

The test at line 3687 explicitly specifies "dataset": "discover" for transaction events, but other tests querying error events don't specify a dataset parameter. For clarity and consistency, consider explicitly specifying "dataset": "errors" in tests that query error events, or add a comment explaining the default dataset behavior.

Also applies to: 3641-3641, 3687-3687, 3711-3711

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a440b4 and 6ad6fe3.

📒 Files selected for processing (8)
  • pyproject.toml (2 hunks)
  • sentry-repo (1 hunks)
  • src/sentry/api/endpoints/organization_events_stats.py (6 hunks)
  • src/sentry/api/helpers/error_upsampling.py (1 hunks)
  • src/sentry/search/events/datasets/discover.py (1 hunks)
  • src/sentry/testutils/factories.py (3 hunks)
  • tests/sentry/api/helpers/test_error_upsampling.py (1 hunks)
  • tests/snuba/api/endpoints/test_organization_events_stats.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
src/sentry/api/helpers/error_upsampling.py

44-44: Unused function argument: organization

(ARG001)

src/sentry/search/events/datasets/discover.py

1046-1046: Unused lambda argument: args

(ARG005)

src/sentry/testutils/factories.py

351-352: try-except-pass detected, consider logging the exception

(S110)


351-351: Do not catch blind exception: Exception

(BLE001)


356-357: try-except-pass detected, consider logging the exception

(S110)


356-356: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (2)
pyproject.toml (1)

176-177: Aligning new helpers/tests with stronger typing is appropriate

Including sentry.api.helpers.error_upsampling and its tests in the stricter mypy overrides is consistent with the rest of the configuration and should help keep this logic well-typed.

Also applies to: 464-465

tests/snuba/api/endpoints/test_organization_events_stats.py (1)

3610-3621: Feature flags are not required for error upsampling.

The error upsampling functionality is controlled via configuration options (allowlist), not feature flags. The helpers in src/sentry/api/helpers/error_upsampling.py check options.get("issues.client_error_sampling.project_allowlist") and verify dataset/query type—there are no feature flag checks anywhere in this code path. The test methods correctly mock the options to simulate different allowlist states, making feature flag wrapping unnecessary. Unlike other tests in this file that feature-gate specific functionality, these tests verify option-controlled behavior, which is a different mechanism.

Likely an incorrect or invalid review comment.

Comment on lines +14 to +17
from sentry.api.helpers.error_upsampling import (
is_errors_query_for_error_upsampled_projects,
transform_query_columns_for_error_upsampling,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Upsampling eligibility should use scoped_dataset, not outer dataset

The new wiring in _get_event_stats correctly centralizes the upsampling decision and applies the column transform right before query execution. However, there’s an important edge case:

should_upsample = is_errors_query_for_error_upsampled_projects(
    snuba_params, organization, dataset, request
)

Here you pass the outer dataset, even though _get_event_stats is parameterized by scoped_dataset and get_event_stats_factory explicitly calls it with different datasets (notably split_dataset = discover when splitting a metrics-backed dashboard widget into error vs transaction-like queries).

When the original dataset is a metrics dataset and split_dataset is discover, _should_apply_sample_weight_transform inside is_errors_query_for_error_upsampled_projects will see the metrics dataset and return False, so upsampling is silently disabled for those split error queries even though they’re executed against discover where sample_weight is available.

You likely want to base the eligibility on the dataset actually being queried:

-            should_upsample = is_errors_query_for_error_upsampled_projects(
-                snuba_params, organization, dataset, request
-            )
+            should_upsample = is_errors_query_for_error_upsampled_projects(
+                snuba_params, organization, scoped_dataset, request
+            )

This keeps behavior unchanged in the simple path (where scoped_dataset == dataset), while enabling correct upsampling for the additional split queries the factory issues.

Also applies to: 124-124, 209-298, 326-499

🤖 Prompt for AI Agents
In src/sentry/api/endpoints/organization_events_stats.py around lines 14-17,
124, 209-298, and 326-499, the calls to
is_errors_query_for_error_upsampled_projects (and any related upsampling
eligibility checks) incorrectly pass the outer dataset variable; change these
calls to pass scoped_dataset (the dataset actually being queried) so the
upsampling eligibility and subsequent sample_weight/column transform logic use
the query-specific dataset. Update any nearby uses that infer dataset for
transform or eligibility to use scoped_dataset, and run tests to confirm
behavior is unchanged when scoped_dataset == dataset but correct for split
queries (e.g., metrics->discover).

from base64 import b64encode
from binascii import hexlify
from collections.abc import Mapping, Sequence
from collections.abc import Mapping, MutableMapping, Sequence

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten _set_sample_rate_from_error_sampling semantics and exception handling

The propagation of contexts.error_sampling.client_sample_rate into sample_rate is useful, but there are two edge cases worth fixing:

  1. 0 sample rate is skipped
    Using if client_sample_rate: means a valid 0 (or "0") will be ignored. You likely want to treat 0 as a valid explicit value.

  2. Bare except Exception: pass hides malformed data
    Swallowing all exceptions without logging makes it hard to notice bad payloads. Narrowing the exception types keeps tests robust without masking real problems.

A minimal adjustment that preserves resilience while fixing these issues:

-def _set_sample_rate_from_error_sampling(normalized_data: MutableMapping[str, Any]) -> None:
-    """Set 'sample_rate' on normalized_data if contexts.error_sampling.client_sample_rate is present and valid."""
-    client_sample_rate = None
-    try:
-        client_sample_rate = (
-            normalized_data.get("contexts", {}).get("error_sampling", {}).get("client_sample_rate")
-        )
-    except Exception:
-        pass
-    if client_sample_rate:
-        try:
-            normalized_data["sample_rate"] = float(client_sample_rate)
-        except Exception:
-            pass
+def _set_sample_rate_from_error_sampling(normalized_data: MutableMapping[str, Any]) -> None:
+    """Set 'sample_rate' on normalized_data if contexts.error_sampling.client_sample_rate is present and valid."""
+    client_sample_rate = (
+        normalized_data.get("contexts", {})
+        .get("error_sampling", {})
+        .get("client_sample_rate")
+    )
+    if client_sample_rate is not None:
+        try:
+            normalized_data["sample_rate"] = float(client_sample_rate)
+        except (TypeError, ValueError):
+            # Ignore invalid values but don't break tests
+            pass

The call in store_event then benefits from the stricter but still safe behavior without further changes.

Also applies to: 344-357, 1047-1050

🤖 Prompt for AI Agents
In src/sentry/testutils/factories.py around lines 11 (and also apply same
changes to 344-357 and 1047-1050), tighten
`_set_sample_rate_from_error_sampling` by treating explicit zero as a valid
sample rate (replace the truthy check like `if client_sample_rate:` with an
explicit None/empty check so values 0 and "0" are accepted) and remove the bare
`except Exception:` in favor of narrowly catching expected parsing/errors (e.g.,
ValueError, TypeError, KeyError) and surface or log the error instead of
silently passing; update the handling accordingly so malformed data does not get
silently swallowed while preserving resilience.

"event_id": "a" * 32,
"message": "very bad",
"type": "error",
"exception": [{"type": "ValueError", "value": "Something went wrong 1"}],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the exception field structure.

The exception field should be a dictionary with a values key, not a direct list. The correct format is:

"exception": {
    "values": [{"type": "ValueError", "value": "Something went wrong 1"}]
}

This matches the Sentry event schema and the pattern used elsewhere in this file (see lines 2479-2487).

Apply this diff:

-                "exception": [{"type": "ValueError", "value": "Something went wrong 1"}],
+                "exception": {
+                    "values": [{"type": "ValueError", "value": "Something went wrong 1"}]
+                },

And similarly for line 3588:

-                "exception": [{"type": "ValueError", "value": "Something went wrong 2"}],
+                "exception": {
+                    "values": [{"type": "ValueError", "value": "Something went wrong 2"}]
+                },

Also applies to: 3588-3588

🤖 Prompt for AI Agents
In tests/snuba/api/endpoints/test_organization_events_stats.py around lines 3575
and 3588, the test uses "exception" as a direct list but it must be an object
with a "values" key per Sentry schema; change each "exception": [ ... ] to
"exception": {"values": [ ... ]} so the exception payload matches the expected
structure used elsewhere in the file.

Comment on lines +3626 to +3627
assert data[0][1][0]["count"] == 10 # First bucket has 1 event
assert data[1][1][0]["count"] == 10 # Second bucket has 1 event

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify misleading comments.

The comments state "First bucket has 1 event" and "Second bucket has 1 event", but the assertions expect count == 10. The comments should clarify that each bucket contains 1 sampled event that is upsampled 10x due to the client_sample_rate: 0.1.

Apply this diff:

-        assert data[0][1][0]["count"] == 10  # First bucket has 1 event
-        assert data[1][1][0]["count"] == 10  # Second bucket has 1 event
+        assert data[0][1][0]["count"] == 10  # First bucket: 1 event upsampled 10x (sample_rate=0.1)
+        assert data[1][1][0]["count"] == 10  # Second bucket: 1 event upsampled 10x (sample_rate=0.1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert data[0][1][0]["count"] == 10 # First bucket has 1 event
assert data[1][1][0]["count"] == 10 # Second bucket has 1 event
assert data[0][1][0]["count"] == 10 # First bucket: 1 event upsampled 10x (sample_rate=0.1)
assert data[1][1][0]["count"] == 10 # Second bucket: 1 event upsampled 10x (sample_rate=0.1)
🤖 Prompt for AI Agents
In tests/snuba/api/endpoints/test_organization_events_stats.py around lines 3626
to 3627, the inline comments are misleading: they say "First/Second bucket has 1
event" while the assertions expect count == 10 due to upsampling from
client_sample_rate: 0.1. Update the comments to explain that each bucket
contains 1 sampled event which is upsampled by a factor of 10
(client_sample_rate 0.1) resulting in count == 10, so replace the existing
comments with a concise note describing sampled event + upsampling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants