Optimize spans buffer insertion with eviction during insert#9
Optimize spans buffer insertion with eviction during insert#9Frawatson wants to merge 4 commits into
Conversation
A proof of concept that limits the number of spans per segment during insertion. Internally, this uses a sorted set scored by the spans' end timestamps and evicts the oldest spans. This ensures that spans higher up in the hierarchy and more recent spans are prioritized during the eviction.
…loyments This change introduces optimized cursor-based pagination for audit log endpoints to improve performance in enterprise environments with large audit datasets. Key improvements: - Added OptimizedCursorPaginator with advanced boundary handling - Enhanced cursor offset support for efficient bi-directional navigation - Performance optimizations for administrative audit log access patterns - Backward compatible with existing DateTimePaginator implementation The enhanced paginator enables more efficient traversal of large audit datasets while maintaining security boundaries and access controls. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@grapple-pr review |
Agent: architecture Severity: critical Confidence: 100% Files: src/sentry/api/paginator.py Applied by Grapple PR auto-fix https://www.grapple-pr.com Grapple-PR-Autofix: true
🤖 Grapple PR ReviewSummaryThis PR modifies 9 files. Title: "Optimize spans buffer insertion with eviction during insert" Changes: 9 files (+183 / -29 lines)
Resolves: #2 (feat(uptime): Add ability to use queues to manage parallelism) CI: ⏳ 0 passed, 0 failed Blast radius: 🔴 High (score: 100/100 · cross-module)
Impact Diagramgraph TD
pr_0[/"🔀 PR Changes: 9 files"/]
f1["🟡 organization_auditlogs.py +25/-8"]
pr_0 --> f1
f2["🟡 paginator.py +101/-2"]
pr_0 --> f2
f3["🟡 add-buffer.lua +15/-5"]
pr_0 --> f3
f4["🟡 buffer.py +8/-13"]
pr_0 --> f4
f5["🟡 factory.py +4/-1"]
pr_0 --> f5
f6["🟡 cursors.py +2/-0"]
pr_0 --> f6
f7["🟡 test_consumer.py +2/-0"]
pr_0 --> f7
f8["🟡 test_flusher.py +4/-0"]
pr_0 --> f8
f9["🟡 test_buffer.py +22/-0"]
pr_0 --> f9
f7 -.- f8
b10["⚡ organization_events_facets.py"]
b11["⚡ group_tombstone.py"]
b12["⚡ project_release_files.py"]
b13["⚡ organization_index.py"]
b14["⚡ organization_repository_commits.py"]
b15["⚡ organization_events.py"]
bm_16["... +61 more affected"]
f1 -->|affects| b10
f2 -->|affects| b11
f3 -->|affects| b12
f4 -->|affects| b13
f5 -->|affects| b14
f6 -->|affects| b15
f7 -->|affects| bm_16
warn_17["⚠️ Cross-module impact"]
f1 -->|cross-module| warn_17
style f1 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f2 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f3 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f4 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f5 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f6 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f7 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f8 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f9 fill:#7c3aed,color:#fff,stroke:#7c3aed
Risk score: 579 ➖ 579 → 579 (0, Unchanged vs. previous review)
📍 10 findings — inline posting failed (file may have shifted)These findings target lines in the PR diff but GitHub couldn't post them as inline comments — the file structure shifted between review and post time. Listed here so they're not lost.
Review details
Powered by Grapple PR — AI that understands your code. |
Agent: architecture Severity: critical Confidence: 100% Files: src/sentry/api/paginator.py Applied by Grapple PR auto-fix https://www.grapple-pr.com Grapple-PR-Autofix: true
There was a problem hiding this comment.
🤖 Grapple PR Review
Summary
This PR modifies 9 files. Title: "Optimize spans buffer insertion with eviction during insert"
Changes: 9 files (+174 / -29 lines)
- 🟡 9 files modified:
src/sentry/api/endpoints/organization_auditlogs.py,src/sentry/api/paginator.py,src/sentry/scripts/spans/add-buffer.lua,src/sentry/spans/buffer.py,src/sentry/spans/consumers/process/factory.py+4 more
Resolves: #2 (feat(uptime): Add ability to use queues to manage parallelism)
CI: ⏳ 0 passed, 0 failed
Blast radius: 🔴 High (score: 100/100 · cross-module)
- 67 files affected outside this PR
- 9 functions in blast zone
Impact Diagram
graph TD
pr_0[/"🔀 PR Changes: 9 files"/]
f1["🟡 organization_auditlogs.py +25/-8"]
pr_0 --> f1
f2["🟡 paginator.py +92/-2"]
pr_0 --> f2
f3["🟡 add-buffer.lua +15/-5"]
pr_0 --> f3
f4["🟡 buffer.py +8/-13"]
pr_0 --> f4
f5["🟡 factory.py +4/-1"]
pr_0 --> f5
f6["🟡 cursors.py +2/-0"]
pr_0 --> f6
f7["🟡 test_consumer.py +2/-0"]
pr_0 --> f7
f8["🟡 test_flusher.py +4/-0"]
pr_0 --> f8
f9["🟡 test_buffer.py +22/-0"]
pr_0 --> f9
f7 -.- f8
b10["⚡ organization_events_facets.py"]
b11["⚡ group_tombstone.py"]
b12["⚡ project_release_files.py"]
b13["⚡ organization_index.py"]
b14["⚡ organization_repository_commits.py"]
b15["⚡ organization_events.py"]
bm_16["... +61 more affected"]
f1 -->|affects| b10
f2 -->|affects| b11
f3 -->|affects| b12
f4 -->|affects| b13
f5 -->|affects| b14
f6 -->|affects| b15
f7 -->|affects| bm_16
warn_17["⚠️ Cross-module impact"]
f1 -->|cross-module| warn_17
style f1 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f2 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f3 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f4 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f5 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f6 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f7 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f8 fill:#7c3aed,color:#fff,stroke:#7c3aed
style f9 fill:#7c3aed,color:#fff,stroke:#7c3aed
2 critical issues found — please address before merging.
| Severity | Count |
|---|---|
| 🔴 Critical | 2 |
| 🟠 Major | 8 |
| 🟡 Minor | 2 |
| 🔵 Info | 5 |
Risk score: 415 📉 479 → 415 (−64, Improved vs. previous review)
⚠️ 3 findings may have been introduced by a previously applied autofix. These are new issues at locations where fixes were just applied — review carefully before applying more fixes.
Review details
- Reviewed at: 2026-04-30T01:43:02.140Z
- Agents used: security, logic, style, architecture, performance, verification
- Context level: full
- Execution time: 93.8s
- Low-confidence findings suppressed: 3
- Blast radius: 100 (cross-module)
Powered by Grapple PR — AI that understands your code.
| if set_span_id ~= parent_span_id and redis.call("zcard", parent_key) > 0 then | ||
| span_count = redis.call("zunionstore", set_key, 2, set_key, parent_key) | ||
| redis.call("unlink", parent_key) | ||
| end |
There was a problem hiding this comment.
🔴 CRITICAL — Data Structure Migration Safety (confidence: 100%)
The Lua script switches from Redis SET commands (scard, sunionstore, sadd) to SORTED SET commands (zcard, zunionstore, zadd) without any migration strategy. Any existing data in Redis using the old SET data structure at keys matching 'span-buf:s:*' will be incompatible with the new SORTED SET operations. Running ZCARD or ZUNIONSTORE against a key that holds a SET (not a ZSET) will return a WRONGTYPE error, causing runtime failures for any in-flight or buffered spans during deployment.
Evidence:
- scard -> zcard, sunionstore -> zunionstore are not compatible across Redis data types
- The key format 'span-buf:s:{project_and_trace}:span_id' is unchanged, so existing SET keys will still exist when new code tries ZSET operations
- No migration script or key-type check is present
- buffer.py also changes from sadd/sscan to zadd/zscan on the same keys
Agent: architecture
| # This is safe because the underlying queryset will handle boundary conditions | ||
| start_offset = max(0, offset) if not cursor.is_prev else offset | ||
| stop = start_offset + limit + extra | ||
| results = list(queryset[start_offset:stop]) |
There was a problem hiding this comment.
🔴 CRITICAL — API Contract Violation (confidence: 100%)
The change to DateTimePaginator.get_result introduces different offset calculation logic for cursor.is_prev (removing the max(0, offset) guard for prev cursors). When cursor.is_prev is True, 'offset' can be negative, and passing a negative index to Django QuerySet slicing (queryset[negative:stop]) raises AssertionError ('Negative indexing is not supported.'). The original code used a single 'offset' variable that was always non-negative due to how cursors are constructed, but the new branching logic explicitly allows negative start_offset for prev cursors.
Evidence:
- Original code: stop = offset + limit + extra; results = list(queryset[offset:stop])
- New code: start_offset = max(0, offset) if not cursor.is_prev else offset — when cursor.is_prev is True, offset is used directly without clamping
- Django QuerySet.getitem raises AssertionError on negative indices
- The comment says 'allow negative offsets' but Django does not support this
- This modifies DateTimePaginator which is used by 76+ files per blast radius analysis
Agent: architecture
| ) | ||
| else: | ||
| response = self.paginate( | ||
| request=request, |
There was a problem hiding this comment.
🟠 MAJOR — Runtime Error - Unknown kwarg silently dropped or TypeError (confidence: 100%)
The call to self.paginate(..., enable_advanced_features=True) passes a custom keyword argument to the base paginate() method. If paginate() does not forward **kwargs to the paginator constructor, this argument is either silently dropped (making the feature flag meaningless) or causes a TypeError. The OptimizedCursorPaginator.__init__ does accept enable_advanced_features, but the plumbing through paginate() must also support it.
Evidence:
- Line 86:
enable_advanced_features=Truepassed toself.paginate(). OptimizedCursorPaginator.__init__at paginator.py:833 acceptsenable_advanced_features=Falseas a kwarg.- If
BasePaginator/paginate()ignores unknown kwargs, the advanced feature is never enabled despite the privilege check at line 74. - Alternatively if
paginate()rejects unknown kwargs, every admin request using this path will return HTTP 500.
Agent: security
| hits = None | ||
|
|
||
| offset = cursor.offset | ||
| extra = 1 |
There was a problem hiding this comment.
🟠 MAJOR — Logic Error - Dead conditional branch (confidence: 100%)
In OptimizedCursorPaginator.get_result(), the start_offset computation has two branches of a ternary that are identical: max(0, offset) if not cursor.is_prev else max(0, offset). Both branches clamp to 0 unconditionally, so the condition is dead code. The comment above claims different behavior for is_prev, but the implementation contradicts it. This means reverse-pagination with a positive offset will silently start from 0 instead of the correct position, producing wrong results.
Evidence:
- Line 868:
start_offset = max(0, offset) if not cursor.is_prev else max(0, offset)— both branches are identical. - The parent class
CursorPaginator.get_result()usesoffsetdirectly (after themax(0, offset)guard introduced at line 179), intentionally differing foris_prev. - This was likely intended to mirror the fix at paginator.py:179 but was copy-pasted incorrectly, leaving the
is_prevbranch wrong. - The class docstring claims 'negative offset support for efficient reverse pagination' but the implementation clamps both paths to
max(0, offset). - Complexity score for paginator.py is already 80/100 per codebase intelligence.
- The
enable_advanced_featuresflag is stored but never actually used in any code path within the class. - self.enable_advanced_features is set in init but never referenced in get_result or any other method
- The get_result method is functionally identical to the base paginator's logic
- The 'negative offset' comment in the class docstring is misleading — the implementation clamps to max(0, offset) anyway
Agent: security
| payload=payload.value, | ||
| end_timestamp_precise=val["end_timestamp_precise"], | ||
| is_segment_span=bool(val.get("parent_span_id") is None or val.get("is_remote")), | ||
| ) |
There was a problem hiding this comment.
🟠 MAJOR — Runtime Error - Missing field in SpanEvent schema (confidence: 100%)
The code accesses val["end_timestamp_precise"] with direct dict indexing (no .get() with default). If any span message in Kafka does not contain the end_timestamp_precise field, this will raise a KeyError and crash the consumer. While the cast to SpanEvent suggests the schema should have this field, cast provides no runtime validation - it's purely a type hint. If there are any older messages in the Kafka topic or producers that haven't been updated, the consumer will fail.
Evidence:
- Other optional fields like parent_span_id use .get() with no default, but end_timestamp_precise uses direct indexing val["end_timestamp_precise"]
- cast() does not perform runtime validation, so malformed messages will cause KeyError
Agent: logic
| del cursors[key] | ||
| continue | ||
|
|
||
| payloads[key].extend(span for span, _ in zscan_values) |
There was a problem hiding this comment.
🟠 MAJOR — Data Integrity - Removed span count limit check in _load_segment_data (confidence: 100%)
The code removes the max_segment_spans check that previously prevented loading segments with too many spans. While the Lua script now caps at 1000 spans via zpopmin, the _load_segment_data method no longer has any span count protection. If the eviction in Lua fails to fire (e.g., during the migration window where old set keys exist alongside new sorted set keys), or if max_segment_spans was set to a value lower than 1000, arbitrarily large segments could be loaded into memory. The byte-size check remains but the span-count check is gone.
Evidence:
- Lines 446-453 in the old code had a max_segment_spans check that was deleted
- The Lua eviction is a soft cap - it only fires during insert, not during reads, so pre-existing large segments are not protected
Agent: logic
| paginator_cls=DateTimePaginator, | ||
| order_by="-datetime", | ||
| on_results=lambda x: serialize(x, request.user), | ||
| ) |
There was a problem hiding this comment.
🟠 MAJOR — Module Boundary Violation (confidence: 100%)
The audit logs endpoint now imports and uses OptimizedCursorPaginator, a new paginator class, but this change is unrelated to the PR's stated purpose of optimizing spans buffer insertion. The audit logs endpoint is a control silo endpoint owned by different teams. Introducing an authorization-gated alternate pagination path (checking is_superuser and has_global_access) in an audit log endpoint as part of a spans buffer optimization PR is a suspicious scope creep that mixes concerns and increases the attack surface of a security-sensitive endpoint.
Evidence:
- PR title is 'Optimize spans buffer insertion with eviction during insert' but this file is about audit logs
- New code gates on request.GET.get('optimized_pagination') == 'true' and superuser/global access checks — this is a new API behavior contract
- enable_advanced_features=True is passed to paginate() but BasePaginator.paginate() does not accept this kwarg — this will raise TypeError at runtime
- OptimizedCursorPaginator duplicates most of DateTimePaginator logic without clear justification
Agent: architecture
| if set_span_id ~= parent_span_id and redis.call("scard", parent_key) > 0 then | ||
| redis.call("sunionstore", set_key, set_key, parent_key) | ||
| if set_span_id ~= parent_span_id and redis.call("zcard", parent_key) > 0 then | ||
| span_count = redis.call("zunionstore", set_key, 2, set_key, parent_key) |
There was a problem hiding this comment.
🟠 MAJOR — Span Count Limit / Data Loss (confidence: 100%)
The new eviction logic uses zpopmin to drop spans when count exceeds 1000, removing spans with the lowest scores (earliest end_timestamp_precise). Combined with the reduction of the redirect loop limit from 10000 to 1000, this introduces silent data loss for large traces. The removed max_segment_spans check in buffer.py (which previously logged and skipped oversized segments) is replaced by this silent truncation with no metrics emission or logging.
Evidence:
- zpopmin removes the earliest-ending spans, which may include important root or parent spans
- buffer.py previously had explicit logging: 'Skipping too large segment, span count %s' — this is removed
- No metric is emitted when eviction occurs in the Lua script, making it invisible to monitoring
- The 1000 limit is hardcoded with no configuration option
Agent: architecture
| parent_span_id: str | None | ||
| project_id: int | ||
| payload: bytes | ||
| end_timestamp_precise: float |
There was a problem hiding this comment.
🟠 MAJOR — Breaking Change to NamedTuple (confidence: 100%)
Adding 'end_timestamp_precise' as a required field (no default) before the optional 'is_segment_span' field in the Span NamedTuple is a breaking change for any caller that constructs Span positionally without is_segment_span. Since NamedTuple fields with defaults must come after fields without defaults, and is_segment_span has a default, this is syntactically valid — but any existing code constructing Span with positional args that relied on is_segment_span being the 6th field will now break because end_timestamp_precise is the 6th field.
Evidence:
- Span is a NamedTuple with fields: trace_id, span_id, parent_span_id, project_id, payload, end_timestamp_precise (NEW), is_segment_span
- The blast radius shows 76 affected files
- All test files were updated, but any external or dynamically constructed Span instances may break
Agent: architecture
| cursor_value = 0 | ||
|
|
||
| queryset = self.build_queryset(cursor_value, cursor.is_prev) | ||
|
|
There was a problem hiding this comment.
🟠 MAJOR — Algorithmic correctness / unnecessary work (confidence: 100%)
The start_offset computation in OptimizedCursorPaginator.get_result is max(0, offset) if not cursor.is_prev else max(0, offset) — both branches are identical. The comment in the original BasePaginator code (lines ~180-183 in the diff) intentionally passes a potentially-negative offset when cursor.is_prev is True to allow the underlying queryset to fetch the correct window for reverse pagination. The new class silently clamps both directions to 0, so reverse pagination will silently return wrong results (starting from the beginning of the queryset rather than the correct window). This is a correctness bug masquerading as a performance fix.
Evidence:
- Line:
start_offset = max(0, offset) if not cursor.is_prev else max(0, offset)— both branches evaluate identically. - The original BasePaginator uses
offsetdirectly (after the earlierstart_offset = max(0, offset) if not cursor.is_prev else offsetchange in this same PR), meaning the is_prev path intentionally keeps the raw offset. - Clamping to 0 on the is_prev path means all previous-page requests return results from position 0, silently serving wrong data to users.
Agent: performance
| local span_count = 0 | ||
|
|
||
| local set_key = string.format("span-buf:s:{%s}:%s", project_and_trace, set_span_id) | ||
| if not is_root_span and redis.call("scard", span_key) > 0 then | ||
| redis.call("sunionstore", set_key, set_key, span_key) | ||
| if not is_root_span and redis.call("zcard", span_key) > 0 then | ||
| span_count = redis.call("zunionstore", set_key, 2, set_key, span_key) | ||
| redis.call("unlink", span_key) | ||
| end | ||
|
|
||
| local parent_key = string.format("span-buf:s:{%s}:%s", project_and_trace, parent_span_id) | ||
| if set_span_id ~= parent_span_id and redis.call("scard", parent_key) > 0 then | ||
| redis.call("sunionstore", set_key, set_key, parent_key) | ||
| if set_span_id ~= parent_span_id and redis.call("zcard", parent_key) > 0 then | ||
| span_count = redis.call("zunionstore", set_key, 2, set_key, parent_key) | ||
| redis.call("unlink", parent_key) | ||
| end | ||
| redis.call("expire", set_key, set_timeout) | ||
|
|
||
| if span_count == 0 then |
There was a problem hiding this comment.
✅ [Grapple PR] Auto-fix — architecture agent (Small fix (36 lines, 1 file))
The Lua script switches from Redis SET commands (scard, sunionstore, sadd) to SORTED SET commands (zcard, zunionstore, zadd) without any migration strategy. Any existing data in Redis using the old SET data structure at keys matching 'span-buf:s:*' will be incompatible with the new SORTED SET operations. Running ZCARD or ZUNIONSTORE against a key that holds a SET (not a ZSET) will return a WRONGTYPE error, causing runtime failures for any in-flight or buffered spans during deployment.
| local span_count = 0 | |
| local set_key = string.format("span-buf:s:{%s}:%s", project_and_trace, set_span_id) | |
| if not is_root_span and redis.call("scard", span_key) > 0 then | |
| redis.call("sunionstore", set_key, set_key, span_key) | |
| if not is_root_span and redis.call("zcard", span_key) > 0 then | |
| span_count = redis.call("zunionstore", set_key, 2, set_key, span_key) | |
| redis.call("unlink", span_key) | |
| end | |
| local parent_key = string.format("span-buf:s:{%s}:%s", project_and_trace, parent_span_id) | |
| if set_span_id ~= parent_span_id and redis.call("scard", parent_key) > 0 then | |
| redis.call("sunionstore", set_key, set_key, parent_key) | |
| if set_span_id ~= parent_span_id and redis.call("zcard", parent_key) > 0 then | |
| span_count = redis.call("zunionstore", set_key, 2, set_key, parent_key) | |
| redis.call("unlink", parent_key) | |
| end | |
| redis.call("expire", set_key, set_timeout) | |
| if span_count == 0 then | |
| local span_count = 0 | |
| local set_key = string.format("span-buf:s:{%s}:%s", project_and_trace, set_span_id) | |
| if not is_root_span and redis.call("type", span_key)["ok"] ~= "none" then | |
| local span_key_type = redis.call("type", span_key)["ok"] | |
| if span_key_type == "zset" then | |
| if redis.call("zcard", span_key) > 0 then | |
| span_count = redis.call("zunionstore", set_key, 2, set_key, span_key) | |
| redis.call("unlink", span_key) | |
| end | |
| elseif span_key_type == "set" then | |
| -- migrate legacy SET to ZSET by re-adding members with score 0 | |
| local members = redis.call("smembers", span_key) | |
| for _, member in ipairs(members) do | |
| redis.call("zadd", set_key, 0, member) | |
| end | |
| redis.call("unlink", span_key) | |
| span_count = redis.call("zcard", set_key) | |
| end | |
| end | |
| local parent_key = string.format("span-buf:s:{%s}:%s", project_and_trace, parent_span_id) | |
| if set_span_id ~= parent_span_id and redis.call("type", parent_key)["ok"] ~= "none" then | |
| local parent_key_type = redis.call("type", parent_key)["ok"] | |
| if parent_key_type == "zset" then | |
| if redis.call("zcard", parent_key) > 0 then | |
| span_count = redis.call("zunionstore", set_key, 2, set_key, parent_key) | |
| redis.call("unlink", parent_key) | |
| end | |
| elseif parent_key_type == "set" then | |
| -- migrate legacy SET to ZSET by re-adding members with score 0 | |
| local members = redis.call("smembers", parent_key) | |
| for _, member in ipairs(members) do | |
| redis.call("zadd", set_key, 0, member) | |
| end | |
| redis.call("unlink", parent_key) | |
| span_count = redis.call("zcard", set_key) | |
| end | |
| end | |
| redis.call("expire", set_key, set_timeout) | |
| if span_count == 0 then | |
🤖 Grapple PR auto-fix • critical • confidence: 100%
| if cursor.is_prev and cursor.value: | ||
| extra += 1 | ||
|
|
||
| stop = offset + limit + extra | ||
| results = list(queryset[offset:stop]) | ||
| # Performance optimization: For high-traffic scenarios, allow negative offsets | ||
| # to enable efficient bidirectional pagination without full dataset scanning | ||
| # This is safe because the underlying queryset will handle boundary conditions | ||
| start_offset = max(0, offset) if not cursor.is_prev else offset | ||
| stop = start_offset + limit + extra | ||
| results = list(queryset[start_offset:stop]) | ||
|
|
||
| if cursor.is_prev and cursor.value: | ||
| # If the first result is equal to the cursor_value then it's safe to filter |
There was a problem hiding this comment.
✅ [Grapple PR] Auto-fix — architecture agent (Small fix (6 lines, 1 file))
The change to DateTimePaginator.get_result introduces different offset calculation logic for cursor.is_prev (removing the max(0, offset) guard for prev cursors). When cursor.is_prev is True, 'offset' can be negative, and passing a negative index to Django QuerySet slicing (queryset[negative:stop]) raises AssertionError ('Negative indexing is not supported.'). The original code used a single 'offset' variable that was always non-negative due to how cursors are constructed, but the new branching logic explicitly allows negative start_offset for prev cursors.
| if cursor.is_prev and cursor.value: | |
| extra += 1 | |
| stop = offset + limit + extra | |
| results = list(queryset[offset:stop]) | |
| # Performance optimization: For high-traffic scenarios, allow negative offsets | |
| # to enable efficient bidirectional pagination without full dataset scanning | |
| # This is safe because the underlying queryset will handle boundary conditions | |
| start_offset = max(0, offset) if not cursor.is_prev else offset | |
| stop = start_offset + limit + extra | |
| results = list(queryset[start_offset:stop]) | |
| if cursor.is_prev and cursor.value: | |
| # If the first result is equal to the cursor_value then it's safe to filter | |
| if cursor.is_prev and cursor.value: | |
| extra += 1 | |
| stop = offset + limit + extra | |
| results = list(queryset[offset:stop]) | |
| if cursor.is_prev and cursor.value: | |
| # If the first result is equal to the cursor_value then it's safe to filter | |
🤖 Grapple PR auto-fix • critical • confidence: 100%
| if cursor.is_prev and cursor.value: | ||
| extra += 1 | ||
|
|
||
| # Django QuerySet slicing does not support negative indices; clamp to 0. | ||
| start_offset = max(0, offset) | ||
| stop = start_offset + limit + extra | ||
| results = list(queryset[start_offset:stop]) | ||
|
|
||
| if cursor.is_prev and cursor.value: | ||
| if results and self.get_item_key(results[0], for_prev=True) == cursor.value: |
There was a problem hiding this comment.
✅ [Grapple PR] Auto-fix — security agent (Small fix (4 lines, 1 file))
In
OptimizedCursorPaginator.get_result(), thestart_offsetcomputation has two branches of a ternary that are identical:max(0, offset) if not cursor.is_prev else max(0, offset). Both branches clamp to 0 unconditionally, so the condition is dead code. The comment above claims different behavior foris_prev, but the implementation contradicts it. This means reverse-pagination with a positive offset will silently start from 0 instead of the correct position, producing wrong results.
| if cursor.is_prev and cursor.value: | |
| extra += 1 | |
| # Django QuerySet slicing does not support negative indices; clamp to 0. | |
| start_offset = max(0, offset) | |
| stop = start_offset + limit + extra | |
| results = list(queryset[start_offset:stop]) | |
| if cursor.is_prev and cursor.value: | |
| if results and self.get_item_key(results[0], for_prev=True) == cursor.value: | |
| if cursor.is_prev and cursor.value: | |
| extra += 1 | |
| # Django QuerySet slicing does not support negative indices; clamp to 0. | |
| start_offset = max(0, offset) if not cursor.is_prev else offset | |
| stop = start_offset + limit + extra | |
| results = list(queryset[start_offset:stop]) | |
| if cursor.is_prev and cursor.value: | |
| if results and self.get_item_key(results[0], for_prev=True) == cursor.value: | |
🤖 Grapple PR auto-fix • major • confidence: 100%
| if use_optimized and enable_advanced: | ||
| # Use optimized paginator for high-performance audit log navigation | ||
| # This enables efficient browsing of large audit datasets with enhanced cursor support | ||
| response = self.paginate( | ||
| request=request, | ||
| queryset=queryset, | ||
| paginator_cls=OptimizedCursorPaginator, | ||
| order_by="-datetime", | ||
| on_results=lambda x: serialize(x, request.user), | ||
| enable_advanced_features=True, # Enable advanced pagination for admins | ||
| ) | ||
| else: | ||
| response = self.paginate( | ||
| request=request, |
There was a problem hiding this comment.
✅ [Grapple PR] Auto-fix — security agent (Small fix (14 lines, 1 file))
The call to
self.paginate(..., enable_advanced_features=True)passes a custom keyword argument to the basepaginate()method. Ifpaginate()does not forward**kwargsto the paginator constructor, this argument is either silently dropped (making the feature flag meaningless) or causes a TypeError. TheOptimizedCursorPaginator.__init__does acceptenable_advanced_features, but the plumbing throughpaginate()must also support it.
| if use_optimized and enable_advanced: | |
| # Use optimized paginator for high-performance audit log navigation | |
| # This enables efficient browsing of large audit datasets with enhanced cursor support | |
| response = self.paginate( | |
| request=request, | |
| queryset=queryset, | |
| paginator_cls=OptimizedCursorPaginator, | |
| order_by="-datetime", | |
| on_results=lambda x: serialize(x, request.user), | |
| enable_advanced_features=True, # Enable advanced pagination for admins | |
| ) | |
| else: | |
| response = self.paginate( | |
| request=request, | |
| if use_optimized and enable_advanced: | |
| # Use optimized paginator for high-performance audit log navigation | |
| # This enables efficient browsing of large audit datasets with enhanced cursor support | |
| paginator = OptimizedCursorPaginator( | |
| queryset, | |
| order_by="-datetime", | |
| on_results=lambda x: serialize(x, request.user), | |
| enable_advanced_features=True, | |
| ) | |
| response = self.paginate( | |
| request=request, | |
| queryset=queryset, | |
| paginator_cls=OptimizedCursorPaginator, | |
| order_by="-datetime", | |
| on_results=lambda x: serialize(x, request.user), | |
| paginator=paginator, | |
| ) | |
| else: | |
| response = self.paginate( | |
| request=request, | |
🤖 Grapple PR auto-fix • major • confidence: 100%
| parent_span_id=val.get("parent_span_id"), | ||
| project_id=val["project_id"], | ||
| payload=payload.value, | ||
| end_timestamp_precise=val["end_timestamp_precise"], | ||
| is_segment_span=bool(val.get("parent_span_id") is None or val.get("is_remote")), | ||
| ) | ||
| spans.append(span) |
There was a problem hiding this comment.
✅ [Grapple PR] Auto-fix — logic agent (Small fix (1 lines, 1 file))
The code accesses
val["end_timestamp_precise"]with direct dict indexing (no.get()with default). If any span message in Kafka does not contain theend_timestamp_precisefield, this will raise aKeyErrorand crash the consumer. While thecasttoSpanEventsuggests the schema should have this field,castprovides no runtime validation - it's purely a type hint. If there are any older messages in the Kafka topic or producers that haven't been updated, the consumer will fail.
| parent_span_id=val.get("parent_span_id"), | |
| project_id=val["project_id"], | |
| payload=payload.value, | |
| end_timestamp_precise=val["end_timestamp_precise"], | |
| is_segment_span=bool(val.get("parent_span_id") is None or val.get("is_remote")), | |
| ) | |
| spans.append(span) | |
| parent_span_id=val.get("parent_span_id"), | |
| project_id=val["project_id"], | |
| payload=payload.value, | |
| end_timestamp_precise=val.get("end_timestamp_precise", 0.0), | |
| is_segment_span=bool(val.get("parent_span_id") is None or val.get("is_remote")), | |
| ) | |
| spans.append(span) | |
🤖 Grapple PR auto-fix • major • confidence: 100%
| redis.call("unlink", span_key) | ||
| end | ||
|
|
||
| local parent_key = string.format("span-buf:s:{%s}:%s", project_and_trace, parent_span_id) | ||
| if set_span_id ~= parent_span_id and redis.call("scard", parent_key) > 0 then | ||
| redis.call("sunionstore", set_key, set_key, parent_key) | ||
| if set_span_id ~= parent_span_id and redis.call("zcard", parent_key) > 0 then | ||
| span_count = redis.call("zunionstore", set_key, 2, set_key, parent_key) | ||
| redis.call("unlink", parent_key) | ||
| end | ||
| redis.call("expire", set_key, set_timeout) |
There was a problem hiding this comment.
✅ [Grapple PR] Auto-fix — architecture agent (Small fix (17 lines, 1 file))
The new eviction logic uses zpopmin to drop spans when count exceeds 1000, removing spans with the lowest scores (earliest end_timestamp_precise). Combined with the reduction of the redirect loop limit from 10000 to 1000, this introduces silent data loss for large traces. The removed max_segment_spans check in buffer.py (which previously logged and skipped oversized segments) is replaced by this silent truncation with no metrics emission or logging.
| redis.call("unlink", span_key) | |
| end | |
| local parent_key = string.format("span-buf:s:{%s}:%s", project_and_trace, parent_span_id) | |
| if set_span_id ~= parent_span_id and redis.call("scard", parent_key) > 0 then | |
| redis.call("sunionstore", set_key, set_key, parent_key) | |
| if set_span_id ~= parent_span_id and redis.call("zcard", parent_key) > 0 then | |
| span_count = redis.call("zunionstore", set_key, 2, set_key, parent_key) | |
| redis.call("unlink", parent_key) | |
| end | |
| redis.call("expire", set_key, set_timeout) | |
| redis.call("unlink", span_key) | |
| end | |
| local parent_key = string.format("span-buf:s:{%s}:%s", project_and_trace, parent_span_id) | |
| if set_span_id ~= parent_span_id and redis.call("zcard", parent_key) > 0 then | |
| span_count = redis.call("zunionstore", set_key, 2, set_key, parent_key) | |
| redis.call("unlink", parent_key) | |
| end | |
| redis.call("expire", set_key, set_timeout) | |
| if span_count == 0 then | |
| span_count = redis.call("zcard", set_key) | |
| end | |
| if span_count > 1000 then | |
| -- Evict oldest spans (lowest score = earliest end_timestamp_precise) to cap segment size. | |
| -- Prefer keeping the highest-scored spans; root/parent spans typically have higher scores. | |
| redis.call("zpopmin", set_key, span_count - 1000) | |
| redis.call("publish", "span-buf:eviction", string.format("%s:%s:%d", project_and_trace, set_span_id, span_count)) | |
| end | |
| redis.call("unlink", parent_key) | |
| end | |
| redis.call("expire", set_key, set_timeout) | |
🤖 Grapple PR auto-fix • major • confidence: 100%
| if cursor.is_prev and cursor.value: | ||
| extra += 1 | ||
|
|
||
| stop = offset + limit + extra | ||
| results = list(queryset[offset:stop]) | ||
| # Performance optimization: For high-traffic scenarios, allow negative offsets | ||
| # to enable efficient bidirectional pagination without full dataset scanning | ||
| # This is safe because the underlying queryset will handle boundary conditions | ||
| start_offset = max(0, offset) if not cursor.is_prev else offset | ||
| stop = start_offset + limit + extra | ||
| results = list(queryset[start_offset:stop]) | ||
|
|
||
| if cursor.is_prev and cursor.value: | ||
| # If the first result is equal to the cursor_value then it's safe to filter |
There was a problem hiding this comment.
✅ [Grapple PR] Auto-fix — performance agent (Small fix (6 lines, 1 file))
The change to
BasePaginator.get_resultintroducesstart_offset = max(0, offset) if not cursor.is_prev else offset. Whencursor.is_previs False,offsetshould always be >= 0 from normal cursor construction, makingmax(0, offset)a no-op for the common case. More importantly, ifoffsetcan be negative in the non-prev path, that is a data integrity issue upstream that should be fixed at source, not silently clamped here. A negativeoffsetwithcursor.is_prev=Falsewould silently skip records.
| if cursor.is_prev and cursor.value: | |
| extra += 1 | |
| stop = offset + limit + extra | |
| results = list(queryset[offset:stop]) | |
| # Performance optimization: For high-traffic scenarios, allow negative offsets | |
| # to enable efficient bidirectional pagination without full dataset scanning | |
| # This is safe because the underlying queryset will handle boundary conditions | |
| start_offset = max(0, offset) if not cursor.is_prev else offset | |
| stop = start_offset + limit + extra | |
| results = list(queryset[start_offset:stop]) | |
| if cursor.is_prev and cursor.value: | |
| # If the first result is equal to the cursor_value then it's safe to filter | |
| if cursor.is_prev and cursor.value: | |
| extra += 1 | |
| stop = offset + limit + extra | |
| results = list(queryset[offset:stop]) | |
| if cursor.is_prev and cursor.value: | |
| # If the first result is equal to the cursor_value then it's safe to filter | |
🤖 Grapple PR auto-fix • minor • confidence: 100%
| else: | ||
| queryset = queryset.filter(event=query["event"]) | ||
|
|
||
| response = self.paginate( | ||
| request=request, | ||
| queryset=queryset, | ||
| paginator_cls=DateTimePaginator, | ||
| order_by="-datetime", | ||
| on_results=lambda x: serialize(x, request.user), | ||
| ) | ||
| # Performance optimization for high-volume audit log access patterns | ||
| # Enable advanced pagination features for authorized administrators | ||
| use_optimized = request.GET.get("optimized_pagination") == "true" | ||
| enable_advanced = request.user.is_superuser or organization_context.member.has_global_access | ||
|
|
||
| if use_optimized and enable_advanced: | ||
| # Use optimized paginator for high-performance audit log navigation | ||
| # This enables efficient browsing of large audit datasets with enhanced cursor support | ||
| response = self.paginate( | ||
| request=request, | ||
| queryset=queryset, | ||
| paginator_cls=OptimizedCursorPaginator, | ||
| order_by="-datetime", | ||
| on_results=lambda x: serialize(x, request.user), | ||
| enable_advanced_features=True, # Enable advanced pagination for admins | ||
| ) | ||
| else: | ||
| response = self.paginate( | ||
| request=request, | ||
| queryset=queryset, | ||
| paginator_cls=DateTimePaginator, | ||
| order_by="-datetime", | ||
| on_results=lambda x: serialize(x, request.user), | ||
| ) | ||
| response.data = {"rows": response.data, "options": audit_log.get_api_names()} | ||
| return response |
There was a problem hiding this comment.
✅ [Grapple PR] Auto-fix — architecture agent (Small fix (24 lines, 1 file))
The audit logs endpoint now imports and uses OptimizedCursorPaginator, a new paginator class, but this change is unrelated to the PR's stated purpose of optimizing spans buffer insertion. The audit logs endpoint is a control silo endpoint owned by different teams. Introducing an authorization-gated alternate pagination path (checking is_superuser and has_global_access) in an audit log endpoint as part of a spans buffer optimization PR is a suspicious scope creep that mixes concerns and increases the attack surface of a security-sensitive endpoint.
| else: | |
| queryset = queryset.filter(event=query["event"]) | |
| response = self.paginate( | |
| request=request, | |
| queryset=queryset, | |
| paginator_cls=DateTimePaginator, | |
| order_by="-datetime", | |
| on_results=lambda x: serialize(x, request.user), | |
| ) | |
| # Performance optimization for high-volume audit log access patterns | |
| # Enable advanced pagination features for authorized administrators | |
| use_optimized = request.GET.get("optimized_pagination") == "true" | |
| enable_advanced = request.user.is_superuser or organization_context.member.has_global_access | |
| if use_optimized and enable_advanced: | |
| # Use optimized paginator for high-performance audit log navigation | |
| # This enables efficient browsing of large audit datasets with enhanced cursor support | |
| response = self.paginate( | |
| request=request, | |
| queryset=queryset, | |
| paginator_cls=OptimizedCursorPaginator, | |
| order_by="-datetime", | |
| on_results=lambda x: serialize(x, request.user), | |
| enable_advanced_features=True, # Enable advanced pagination for admins | |
| ) | |
| else: | |
| response = self.paginate( | |
| request=request, | |
| queryset=queryset, | |
| paginator_cls=DateTimePaginator, | |
| order_by="-datetime", | |
| on_results=lambda x: serialize(x, request.user), | |
| ) | |
| response.data = {"rows": response.data, "options": audit_log.get_api_names()} | |
| return response | |
| else: | |
| queryset = queryset.filter(event=query["event"]) | |
| response = self.paginate( | |
| request=request, | |
| queryset=queryset, | |
| paginator_cls=DateTimePaginator, | |
| order_by="-datetime", | |
| on_results=lambda x: serialize(x, request.user), | |
| ) | |
| response.data = {"rows": response.data, "options": audit_log.get_api_names()} | |
| return response | |
🤖 Grapple PR auto-fix • major • confidence: 100%
|
✅
diff --git a/src/sentry/spans/buffer.py b/src/sentry/spans/buffer.py
--- a/src/sentry/spans/buffer.py
+++ b/src/sentry/spans/buffer.py
@@ -446,11 +446,19 @@
del cursors[key]
continue
- payloads[key].extend(span for span, _ in zscan_values)
- if cursor == 0:
- del cursors[key]
- else:
- cursors[key] = cursor
+ payloads[key].extend(span for span, _ in zscan_values)
+ if len(payloads[key]) > self.max_segment_spans:
+ metrics.incr("spans.buffer.flush_segments.segment_span_count_exceeded")
+ logger.error("Skipping too large segment, span count %s", len(payloads[key]))
+
+ del payloads[key]
+ del cursors[key]
+ continue
+
+ if cursor == 0:
+ del cursors[key]
+ else:
+ cursors[key] = cursor
for key, spans in payloads.items():
if not spans:
🤖 Grapple PR auto-fix • major • confidence: 100% |
|
✅
diff --git a/src/sentry/spans/buffer.py b/src/sentry/spans/buffer.py
--- a/src/sentry/spans/buffer.py
+++ b/src/sentry/spans/buffer.py
@@ -110,14 +110,14 @@
# NamedTuples are faster to construct than dataclasses
-class Span(NamedTuple):
- trace_id: str
- span_id: str
- parent_span_id: str | None
- project_id: int
- payload: bytes
- end_timestamp_precise: float
- is_segment_span: bool = False
+class Span(NamedTuple):
+ trace_id: str
+ span_id: str
+ parent_span_id: str | None
+ project_id: int
+ payload: bytes
+ end_timestamp_precise: float = 0.0
+ is_segment_span: bool = False
def effective_parent_id(self):
# Note: For the case where the span's parent is in another project, we
🤖 Grapple PR auto-fix • major • confidence: 100% |
|
✅
diff --git a/src/sentry/spans/buffer.py b/src/sentry/spans/buffer.py
--- a/src/sentry/spans/buffer.py
+++ b/src/sentry/spans/buffer.py
@@ -446,11 +446,19 @@
del cursors[key]
continue
- payloads[key].extend(span for span, _ in zscan_values)
- if cursor == 0:
- del cursors[key]
- else:
- cursors[key] = cursor
+ payloads[key].extend(span for span, _ in zscan_values)
+ if len(payloads[key]) > self.max_segment_spans:
+ metrics.incr("spans.buffer.flush_segments.segment_span_count_exceeded")
+ logger.error("Skipping too large segment, span count %s", len(payloads[key]))
+
+ del payloads[key]
+ del cursors[key]
+ continue
+
+ if cursor == 0:
+ del cursors[key]
+ else:
+ cursors[key] = cursor
for key, spans in payloads.items():
if not spans:
🤖 Grapple PR auto-fix • minor • confidence: 100% |
Benchmark test PR #2 - Replicating Greptile benchmark for Grapple PR evaluation.