feat(upsampling) - Support upsampled error count with performance optimizations by zaibkhan · Pull Request #4 · codoki-ai/sentry-codoki

zaibkhan · 2025-09-03T13:05:45Z

This PR introduces upsampled error count support in the events-stats API, improving error metrics accuracy under race conditions. It also includes performance optimizations with caching to reduce response latency.

Key changes:

Add upsampled error count path to events-stats
Mitigate race conditions around sampling computations
Add caching/perf improvements for hot endpoints
Update tests to cover the new path

Benchmark reference:
Replicated from ai-code-review-evaluation/sentry-greptile PR #3

…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.

- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.

codoki-pr-intelligence · 2025-09-03T14:23:59Z

Codoki PR Review

Summary: Register upsampled_count, fix cache key stability, apply safe transforms
What’s good: Clear separation of concerns: eligibility check, dataset guard, and late column transformation minimize unintended scope. Integration tests exercise the new path across errors/transactions and allowlist mixes.
Review Status: ❌ Requires changes
Overall Priority: High

Issues (Critical & High only)

Severity	Issue	Why it matters	File
High	Correctness — upsampled_count not registered in production registry	upsampled_count is only registered in the test factory but not in the production function registry (e.g., sentry.snuba.functions). In production, this will cause query parsing to fail when the transform is applied, breaking the endpoint for allowlisted orgs.	`…/helpers/error_upsampling.py`

Key Feedback (click to expand)

Needs improvement: Cache key construction uses Python's hash(), which is non-deterministic across processes and will break shared caching/invalidation; switch to a deterministic key. The new upsampling function is only registered in test factories, so live queries will error where the transform is applied.
Testing: Integration tests pass by registering upsampled_count in test factories, masking the fact that production function registry lacks this function. Add a test that runs through the real function registry (e.g., via sentry.snuba.functions) to ensure 'upsampled_count' is defined in production code.
Documentation: Consider documenting in error_upsampling.py where/when invalidate_upsampling_cache must be triggered (e.g., option change hook), and that upsampled_count must be present in the production function registry.
Performance: Using Python's hash() in cache keys prevents cross-process cache reuse and breaks invalidation; use a deterministic join or stable digest of project IDs to enable shared caching and reliable eviction.
Open questions: What mechanism will call invalidate_upsampling_cache when the allowlist option changes? If none exists, should we reduce TTL or wire a signal to ensure consistency?

Confidence: 3/5 — Needs work before merge (1 high · status: Requires changes)

Sequence Diagram

sequenceDiagram
    participant Client
    participant Endpoint
    participant Helper
    Client->>Endpoint: GET /organization-events-stats
    Endpoint->>Helper: is_errors_query_for_error_upsampled_projects(snuba_params, org, dataset, request)
    alt cache hit
        Helper-->>Endpoint: cached_result && _should_apply_sample_weight_transform()
    else cache miss
        Helper->>Helper: _are_all_projects_error_upsampled()
        Helper->>Helper: cache.set(key, is_eligible, 60)
        Helper-->>Endpoint: is_eligible && _should_apply_sample_weight_transform()
    end
    alt should_upsample
        Endpoint->>Endpoint: transform_query_columns_for_error_upsampling()
    end
    Endpoint-->>Client: timeseries response

React with 👍 or 👎 if you found this review useful.

codoki-pr-intelligence · 2025-09-03T14:26:29Z

+    expensive repeated option lookups during high-traffic periods. This is safe
+    because allowlist changes are infrequent and eventual consistency is acceptable.
+    """
+    cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"


🔷 Medium: Using Python's hash() in a cache key is non-deterministic across processes (hash randomization) and will break shared cache reuse and invalidation, leading to inconsistent behavior and missed performance benefits.

Suggested change

cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"

cache_key = f"error_upsampling_eligible:{organization.id}:{','.join(map(str, sorted(snuba_params.project_ids)))}"

codoki-pr-intelligence · 2025-09-03T14:26:30Z

+    This should be called when the allowlist configuration changes to ensure
+    cache consistency across the system.
+    """
+    cache_key = f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}"


🔷 Medium: Same issue as above: invalidation key must match the setting key across processes; using hash() prevents reliable eviction.

Suggested change

cache_key = f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}"

cache_key = f"error_upsampling_eligible:{organization_id}:{','.join(map(str, sorted(project_ids)))}"

codoki-pr-intelligence · 2025-09-03T14:26:32Z

+        if column_lower == "count()":
+            # Transform to upsampled count - assumes sample_weight column exists
+            # for all events in allowlisted projects per our data model requirements
+            transformed_columns.append("upsampled_count() as count")


⚠️ High: upsampled_count is only registered in the test factory but not in the production function registry (e.g., sentry.snuba.functions). In production, this will cause query parsing to fail when the transform is applied, breaking the endpoint for allowlisted orgs.

yuvmen and others added 2 commits July 25, 2025 09:48

codoki-ai deleted a comment from codoki-pr-intelligence Bot Sep 3, 2025

chore: trigger Codoki re-review

534aa4c

codoki-pr-intelligence Bot reviewed Sep 3, 2025

View reviewed changes

zaibkhan mentioned this pull request Sep 3, 2025

fix(security): validate GitHub user during integration installation #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(upsampling) - Support upsampled error count with performance optimizations#4

feat(upsampling) - Support upsampled error count with performance optimizations#4
zaibkhan wants to merge 3 commits into
masterfrom
error-upsampling-race-condition

zaibkhan commented Sep 3, 2025

Uh oh!

codoki-pr-intelligence Bot commented Sep 3, 2025 •

edited

Loading

Uh oh!

codoki-pr-intelligence Bot Sep 3, 2025

Uh oh!

codoki-pr-intelligence Bot Sep 3, 2025

Uh oh!

codoki-pr-intelligence Bot Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"
	cache_key = f"error_upsampling_eligible:{organization.id}:{','.join(map(str, sorted(snuba_params.project_ids)))}"

Conversation

zaibkhan commented Sep 3, 2025

Uh oh!

codoki-pr-intelligence Bot commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codoki PR Review

Sequence Diagram

Uh oh!

codoki-pr-intelligence Bot Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

codoki-pr-intelligence Bot Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

codoki-pr-intelligence Bot Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codoki-pr-intelligence Bot commented Sep 3, 2025 •

edited

Loading