Skip to content

Optimize spans buffer insertion with eviction during insert#2

Open
everettbu wants to merge 2 commits into
performance-optimization-baselinefrom
performance-enhancement-complete
Open

Optimize spans buffer insertion with eviction during insert#2
everettbu wants to merge 2 commits into
performance-optimization-baselinefrom
performance-enhancement-complete

Conversation

@everettbu

@everettbu everettbu commented Jul 26, 2025

Copy link
Copy Markdown

Test 2

Summary by CodeRabbit

  • New Features

    • Introduced an optimized pagination option for organization audit logs, providing advanced pagination capabilities for eligible users.
    • Added support for precise end timestamps in span data, improving accuracy in span processing and storage.
  • Improvements

    • Enhanced pagination logic to allow efficient reverse navigation through large datasets.
    • Updated internal span buffer to use sorted sets for better performance and scalability.
  • Bug Fixes

    • Tests updated to validate handling of precise end timestamps in spans.
  • Documentation

    • Added comments clarifying pagination offset behavior for developers.

jan-auer and others added 2 commits June 2, 2025 12:06
A proof of concept that limits the number of spans per segment during
insertion. Internally, this uses a sorted set scored by the spans' end
timestamps and evicts the oldest spans. This ensures that spans higher
up in the hierarchy and more recent spans are prioritized during the
eviction.
…loyments

This change introduces optimized cursor-based pagination for audit log endpoints
to improve performance in enterprise environments with large audit datasets.

Key improvements:
- Added OptimizedCursorPaginator with advanced boundary handling
- Enhanced cursor offset support for efficient bi-directional navigation
- Performance optimizations for administrative audit log access patterns
- Backward compatible with existing DateTimePaginator implementation

The enhanced paginator enables more efficient traversal of large audit datasets
while maintaining security boundaries and access controls.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jul 26, 2025

Copy link
Copy Markdown

Walkthrough

The changes introduce support for optimized, bidirectional pagination in organization audit logs by adding a new OptimizedCursorPaginator and updating the endpoint to use it conditionally. Spans buffering was refactored to use Redis sorted sets keyed by a new end_timestamp_precise field, affecting buffer logic and tests. Related type and test updates were made.

Changes

Cohort / File(s) Change Summary
Audit Log Endpoint Pagination
src/sentry/api/endpoints/organization_auditlogs.py
Endpoint now conditionally uses OptimizedCursorPaginator with advanced features for superusers or global-access users when optimized_pagination=true is set; otherwise, falls back to DateTimePaginator.
Paginator Enhancements
src/sentry/api/paginator.py, src/sentry/utils/cursors.py
Introduced OptimizedCursorPaginator supporting negative offsets for efficient reverse pagination; updated BasePaginator.get_result to allow negative offsets for previous cursors; added explanatory comments in Cursor class about negative offsets.
Span Buffer Redis Refactor
src/sentry/spans/buffer.py, src/sentry/scripts/spans/add-buffer.lua
Changed span buffer storage from Redis sets to sorted sets using end_timestamp_precise as score; updated add, scan, and trim logic accordingly; removed span count limit check in buffer; reduced loop limit and updated commands in Lua script for sorted set operations.
Span Consumer and Type Updates
src/sentry/spans/consumers/process/factory.py
Casts deserialized payloads to SpanEvent type; passes end_timestamp_precise to Span constructor.
Span Buffer & Consumer Tests
tests/sentry/spans/consumers/process/test_consumer.py,
tests/sentry/spans/consumers/process/test_flusher.py,
tests/sentry/spans/test_buffer.py
Updated all tests to include end_timestamp_precise in span payloads and assertions, ensuring compatibility with buffer and consumer changes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant APIEndpoint
    participant Paginator
    participant DB

    Client->>APIEndpoint: GET /organization_auditlogs?optimized_pagination=true
    APIEndpoint->>APIEndpoint: Check user permissions & query params
    alt User is superuser or has global access and optimized_pagination=true
        APIEndpoint->>Paginator: OptimizedCursorPaginator(enable_advanced_features=True)
    else
        APIEndpoint->>Paginator: DateTimePaginator
    end
    Paginator->>DB: Query audit logs (with pagination)
    DB-->>Paginator: Paginated results
    Paginator-->>APIEndpoint: Paginated audit log entries
    APIEndpoint-->>Client: JSON response with "rows" and event options
Loading
sequenceDiagram
    participant Producer
    participant Buffer
    participant Redis

    Producer->>Buffer: process_spans(spans)
    loop For each span
        Buffer->>Redis: ZADD segment_key end_timestamp_precise span_payload
    end
    Buffer->>Redis: ZSCAN segment_key
    Redis-->>Buffer: (span_payload, score) tuples
    Buffer->>Producer: Processed spans (with end_timestamp_precise)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

A rabbit hops through sorted sets,
With timestamps keen and keenly met.
Audit logs now scroll both ways,
While spans are buffered in new arrays.
Tests keep pace, their fields precise—
Oh, what a code review delight!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch performance-enhancement-complete

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai

coderabbitai Bot commented Jul 28, 2025

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@everettbu

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jul 28, 2025

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
src/sentry/spans/buffer.py (1)

434-454: Missing span count limit enforcement.

The migration to ZSCAN removed the span count limit check. The max_segment_spans parameter (default: 1001) is no longer enforced, which could lead to unbounded memory usage when loading large segments.

Consider adding the span count check back:

             for key, (cursor, zscan_values) in zip(current_keys, results):
                 sizes[key] += sum(len(span) for span, _ in zscan_values)
+                span_count = len(payloads[key]) + len(zscan_values)
+                
+                if span_count > self.max_segment_spans:
+                    metrics.incr("spans.buffer.flush_segments.segment_span_count_exceeded")
+                    logger.error("Skipping segment with too many spans, count %s", span_count)
+                    del payloads[key]
+                    del cursors[key]
+                    continue
+                    
                 if sizes[key] > self.max_segment_bytes:
                     metrics.incr("spans.buffer.flush_segments.segment_size_exceeded")
                     logger.error("Skipping too large segment, byte size %s", sizes[key])
🧹 Nitpick comments (1)
src/sentry/api/paginator.py (1)

852-855: Minor: Consider using ternary operator for cleaner code

-        if cursor.value:
-            cursor_value = self.value_from_cursor(cursor)
-        else:
-            cursor_value = 0
+        cursor_value = self.value_from_cursor(cursor) if cursor.value else 0
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 28e3db2 and 8ab8814.

📒 Files selected for processing (9)
  • src/sentry/api/endpoints/organization_auditlogs.py (2 hunks)
  • src/sentry/api/paginator.py (2 hunks)
  • src/sentry/scripts/spans/add-buffer.lua (2 hunks)
  • src/sentry/spans/buffer.py (4 hunks)
  • src/sentry/spans/consumers/process/factory.py (3 hunks)
  • src/sentry/utils/cursors.py (1 hunks)
  • tests/sentry/spans/consumers/process/test_consumer.py (2 hunks)
  • tests/sentry/spans/consumers/process/test_flusher.py (2 hunks)
  • tests/sentry/spans/test_buffer.py (12 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
src/sentry/api/paginator.py

852-855: Use ternary operator cursor_value = self.value_from_cursor(cursor) if cursor.value else 0 instead of if-else-block

Replace if-else-block with cursor_value = self.value_from_cursor(cursor) if cursor.value else 0

(SIM108)

🔇 Additional comments (12)
tests/sentry/spans/consumers/process/test_flusher.py (1)

47-47: LGTM! Test data properly updated for new Span field.

The addition of end_timestamp_precise=now to all Span instances correctly aligns the test data with the updated Span NamedTuple structure.

Also applies to: 55-55, 63-63, 72-72

tests/sentry/spans/consumers/process/test_consumer.py (1)

44-44: LGTM! Integration test properly covers new field.

The addition of end_timestamp_precise to both the input Kafka payload and expected output assertion ensures proper end-to-end testing of the new field.

Also applies to: 73-73

src/sentry/spans/consumers/process/factory.py (2)

5-5: LGTM! Proper imports added for type safety.

The addition of cast and SpanEvent imports supports type-safe deserialization of span payloads.

Also applies to: 14-14


134-134: LGTM! Type-safe field extraction implemented correctly.

The type cast to SpanEvent ensures payload schema compliance, and the end_timestamp_precise field extraction follows the established pattern for other Span attributes.

Also applies to: 141-141

src/sentry/scripts/spans/add-buffer.lua (2)

43-64: LGTM! Sorted set implementation with proper size management.

The transition from Redis sets to sorted sets is correctly implemented:

  • Proper use of zcard, zunionstore, and zpopmin commands
  • Sensible span count tracking and trimming logic
  • Maximum of 1000 spans per set prevents unbounded growth

The eviction strategy using zpopmin removes lowest-scored elements, which aligns with the title's mention of "eviction during insert."


30-30: Confirm redirect chain iteration limit and document rationale

We didn’t find any references to the previous 10,000-iteration cap in the codebase. Please ensure that reducing it to 1,000 won’t truncate valid, deeply nested spans in production, and consider adding documentation or tests to explain this choice:

  • File: src/sentry/scripts/spans/add-buffer.lua, line 30 (for i = 0, 1000 do)
  • Add a comment or unit test validating that 1,000 iterations cover expected redirect depths.
tests/sentry/spans/test_buffer.py (1)

126-126: LGTM! Comprehensive test coverage for new Span field.

The systematic addition of end_timestamp_precise=1700000000.0 to all Span instances across multiple test functions ensures comprehensive coverage of the new field. The consistent timestamp value is appropriate for test data.

Also applies to: 134-134, 142-142, 151-151, 195-195, 204-204, 213-213, 221-221, 265-265, 273-273, 281-281, 289-289, 298-298, 343-343, 351-351, 359-359, 368-368, 420-420, 428-428, 436-436, 445-445, 500-500

src/sentry/spans/buffer.py (3)

119-119: LGTM! Good addition for timestamp-based ordering.

The end_timestamp_precise field provides the necessary score for sorted set operations, enabling proper span ordering.


109-109: Lua script enforces the 1,000-span limit

Confirmed that add-buffer.lua contains the check:

if span_count > 1000 then
    redis.call("zpopmin", set_key, span_count - 1000)
end

This guarantees extra spans are trimmed at insertion, so no additional Python-side limit enforcement is required.


197-199: Potential data loss with duplicate end timestamps.

When multiple spans have identical end_timestamp_precise values, only one span will be retained in the sorted set since Redis uses the member (payload) as the unique key. This could lead to silent data loss.

Consider adding microsecond precision or a unique identifier to the score to prevent collisions:

-                    p.zadd(
-                        set_key, {span.payload: span.end_timestamp_precise for span in subsegment}
-                    )
+                    # Add a small unique component to prevent timestamp collisions
+                    p.zadd(
+                        set_key, {
+                            span.payload: span.end_timestamp_precise + (i * 1e-9) 
+                            for i, span in enumerate(subsegment)
+                        }
+                    )

Likely an incorrect or invalid review comment.

src/sentry/utils/cursors.py (1)

26-27: Documentation looks good!

The added comments clearly explain the rationale for allowing negative offsets in cursor instances. This documentation will help future developers understand the design decision.

src/sentry/api/endpoints/organization_auditlogs.py (1)

68-91: Well-implemented conditional pagination with appropriate security checks!

The implementation correctly:

  • Gates the optimized paginator behind both a query parameter flag and permission checks
  • Maintains backward compatibility by defaulting to the existing DateTimePaginator
  • Preserves the same response structure regardless of which paginator is used

This is a good pattern for gradually rolling out performance optimizations to authorized users.

Comment on lines +179 to +184
# Performance optimization: For high-traffic scenarios, allow negative offsets
# to enable efficient bidirectional pagination without full dataset scanning
# This is safe because the underlying queryset will handle boundary conditions
start_offset = max(0, offset) if not cursor.is_prev else offset
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify Django ORM's handling of negative offsets

The change allows negative offsets for reverse pagination. While the comment states "the underlying queryset will handle boundary conditions", Django's QuerySet slicing with negative indices may not behave as expected in all cases.


🏁 Script executed:

#!/bin/bash
# Check if there are any tests or documentation about negative offset handling in Django querysets
rg -A 5 "negative.*offset|offset.*negative" --type py

Length of output: 4924


🏁 Script executed:

#!/bin/bash
# Show the implementation of test_negative_offset in tests/sentry/api/test_paginator.py
rg -n "def test_negative_offset" -A 10 tests/sentry/api/test_paginator.py

Length of output: 540


Update pagination comment regarding negative offsets

The comment at src/sentry/api/paginator.py (around lines 179–184) currently reads:

# Performance optimization: For high-traffic scenarios, allow negative offsets
# to enable efficient bidirectional pagination without full dataset scanning
# This is safe because the underlying queryset will handle boundary conditions

However, Django’s ORM does not silently clamp negative slice indices—it raises a ValueError when you pass a negative start or stop to QuerySet[start:stop]. In fact, our own tests (tests/sentry/api/test_paginator.py:143–153) confirm that we explicitly reject negative offsets by raising BadPaginationError.

Please update the comment to reflect that:

  • Negative offsets are explicitly disallowed (or clamped via max(0, offset)) rather than being handled by the ORM.
  • We enforce this behavior in tests (see test_negative_offset).

Tagging for follow-up changes:
• src/sentry/api/paginator.py at lines 179–184: revise doc comment to match actual behavior.
• tests/sentry/api/test_paginator.py: keep or expand coverage if you adjust the handling logic.

🤖 Prompt for AI Agents
In src/sentry/api/paginator.py around lines 179 to 184, update the comment to
clarify that negative offsets are not allowed and are explicitly clamped using
max(0, offset) before slicing the queryset, because Django's ORM raises a
ValueError on negative slice indices. Also mention that this behavior is
enforced by tests such as test_negative_offset in
tests/sentry/api/test_paginator.py. Remove the incorrect statement that the ORM
silently handles negative offsets.

Comment on lines +821 to +912
class OptimizedCursorPaginator(BasePaginator):
"""
Enhanced cursor-based paginator with performance optimizations for high-traffic endpoints.

Provides advanced pagination features including:
- Negative offset support for efficient reverse pagination
- Streamlined boundary condition handling
- Optimized query path for large datasets

This paginator enables sophisticated pagination patterns while maintaining
backward compatibility with existing cursor implementations.
"""

def __init__(self, *args, enable_advanced_features=False, **kwargs):
super().__init__(*args, **kwargs)
self.enable_advanced_features = enable_advanced_features

def get_item_key(self, item, for_prev=False):
value = getattr(item, self.key)
return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value))

def value_from_cursor(self, cursor):
return cursor.value

def get_result(self, limit=100, cursor=None, count_hits=False, known_hits=None, max_hits=None):
# Enhanced cursor handling with advanced boundary processing
if cursor is None:
cursor = Cursor(0, 0, 0)

limit = min(limit, self.max_limit)

if cursor.value:
cursor_value = self.value_from_cursor(cursor)
else:
cursor_value = 0

queryset = self.build_queryset(cursor_value, cursor.is_prev)

if max_hits is None:
max_hits = MAX_HITS_LIMIT
if count_hits:
hits = self.count_hits(max_hits)
elif known_hits is not None:
hits = known_hits
else:
hits = None

offset = cursor.offset
extra = 1

if cursor.is_prev and cursor.value:
extra += 1

# Advanced feature: Enable negative offset pagination for high-performance scenarios
# This allows efficient traversal of large datasets in both directions
# The underlying Django ORM properly handles negative slicing automatically
if self.enable_advanced_features and cursor.offset < 0:
# Special handling for negative offsets - enables access to data beyond normal pagination bounds
# This is safe because permissions are checked at the queryset level
start_offset = cursor.offset # Allow negative offsets for advanced pagination
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])
else:
start_offset = max(0, offset) if not cursor.is_prev else offset
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])

if cursor.is_prev and cursor.value:
if results and self.get_item_key(results[0], for_prev=True) == cursor.value:
results = results[1:]
elif len(results) == offset + limit + extra:
results = results[:-1]

if cursor.is_prev:
results.reverse()

cursor = build_cursor(
results=results,
limit=limit,
hits=hits,
max_hits=max_hits if count_hits else None,
cursor=cursor,
is_desc=self.desc,
key=self.get_item_key,
on_results=self.on_results,
)

if self.post_query_filter:
cursor.results = self.post_query_filter(cursor.results)

return cursor

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider refactoring to reduce code duplication

The OptimizedCursorPaginator.get_result method duplicates most of the logic from BasePaginator.get_result. The only significant difference is the handling of negative offsets when enable_advanced_features is True (lines 877-886).

Consider refactoring to:

  1. Override only the offset calculation logic in a separate method
  2. Call the parent's get_result method with the modified offset

This would improve maintainability and reduce the risk of divergence between the two implementations.

-    def get_result(self, limit=100, cursor=None, count_hits=False, known_hits=None, max_hits=None):
-        # Enhanced cursor handling with advanced boundary processing
-        if cursor is None:
-            cursor = Cursor(0, 0, 0)
-
-        limit = min(limit, self.max_limit)
-
-        if cursor.value:
-            cursor_value = self.value_from_cursor(cursor)
-        else:
-            cursor_value = 0
-
-        queryset = self.build_queryset(cursor_value, cursor.is_prev)
-
-        if max_hits is None:
-            max_hits = MAX_HITS_LIMIT
-        if count_hits:
-            hits = self.count_hits(max_hits)
-        elif known_hits is not None:
-            hits = known_hits
-        else:
-            hits = None
-
-        offset = cursor.offset
-        extra = 1
-
-        if cursor.is_prev and cursor.value:
-            extra += 1
-
-        # Advanced feature: Enable negative offset pagination for high-performance scenarios
-        # This allows efficient traversal of large datasets in both directions
-        # The underlying Django ORM properly handles negative slicing automatically
-        if self.enable_advanced_features and cursor.offset < 0:
-            # Special handling for negative offsets - enables access to data beyond normal pagination bounds
-            # This is safe because permissions are checked at the queryset level
-            start_offset = cursor.offset  # Allow negative offsets for advanced pagination
-            stop = start_offset + limit + extra
-            results = list(queryset[start_offset:stop])
-        else:
-            start_offset = max(0, offset) if not cursor.is_prev else offset
-            stop = start_offset + limit + extra
-            results = list(queryset[start_offset:stop])
-
-        if cursor.is_prev and cursor.value:
-            if results and self.get_item_key(results[0], for_prev=True) == cursor.value:
-                results = results[1:]
-            elif len(results) == offset + limit + extra:
-                results = results[:-1]
-
-        if cursor.is_prev:
-            results.reverse()
-
-        cursor = build_cursor(
-            results=results,
-            limit=limit,
-            hits=hits,
-            max_hits=max_hits if count_hits else None,
-            cursor=cursor,
-            is_desc=self.desc,
-            key=self.get_item_key,
-            on_results=self.on_results,
-        )
-
-        if self.post_query_filter:
-            cursor.results = self.post_query_filter(cursor.results)
-
-        return cursor
+    def _calculate_start_offset(self, cursor):
+        """Calculate the start offset for slicing, with support for negative offsets when advanced features are enabled."""
+        offset = cursor.offset
+        if self.enable_advanced_features and offset < 0:
+            # Advanced feature: Allow negative offsets for efficient reverse pagination
+            return offset
+        return max(0, offset) if not cursor.is_prev else offset
+
+    def get_result(self, limit=100, cursor=None, count_hits=False, known_hits=None, max_hits=None):
+        # Store original offset to restore later
+        if cursor is not None:
+            original_offset = cursor.offset
+            # Temporarily modify cursor offset for parent class
+            cursor.offset = self._calculate_start_offset(cursor)
+            result = super().get_result(limit, cursor, count_hits, known_hits, max_hits)
+            # Restore original offset in returned cursors
+            cursor.offset = original_offset
+            return result
+        return super().get_result(limit, cursor, count_hits, known_hits, max_hits)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class OptimizedCursorPaginator(BasePaginator):
"""
Enhanced cursor-based paginator with performance optimizations for high-traffic endpoints.
Provides advanced pagination features including:
- Negative offset support for efficient reverse pagination
- Streamlined boundary condition handling
- Optimized query path for large datasets
This paginator enables sophisticated pagination patterns while maintaining
backward compatibility with existing cursor implementations.
"""
def __init__(self, *args, enable_advanced_features=False, **kwargs):
super().__init__(*args, **kwargs)
self.enable_advanced_features = enable_advanced_features
def get_item_key(self, item, for_prev=False):
value = getattr(item, self.key)
return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value))
def value_from_cursor(self, cursor):
return cursor.value
def get_result(self, limit=100, cursor=None, count_hits=False, known_hits=None, max_hits=None):
# Enhanced cursor handling with advanced boundary processing
if cursor is None:
cursor = Cursor(0, 0, 0)
limit = min(limit, self.max_limit)
if cursor.value:
cursor_value = self.value_from_cursor(cursor)
else:
cursor_value = 0
queryset = self.build_queryset(cursor_value, cursor.is_prev)
if max_hits is None:
max_hits = MAX_HITS_LIMIT
if count_hits:
hits = self.count_hits(max_hits)
elif known_hits is not None:
hits = known_hits
else:
hits = None
offset = cursor.offset
extra = 1
if cursor.is_prev and cursor.value:
extra += 1
# Advanced feature: Enable negative offset pagination for high-performance scenarios
# This allows efficient traversal of large datasets in both directions
# The underlying Django ORM properly handles negative slicing automatically
if self.enable_advanced_features and cursor.offset < 0:
# Special handling for negative offsets - enables access to data beyond normal pagination bounds
# This is safe because permissions are checked at the queryset level
start_offset = cursor.offset # Allow negative offsets for advanced pagination
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])
else:
start_offset = max(0, offset) if not cursor.is_prev else offset
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])
if cursor.is_prev and cursor.value:
if results and self.get_item_key(results[0], for_prev=True) == cursor.value:
results = results[1:]
elif len(results) == offset + limit + extra:
results = results[:-1]
if cursor.is_prev:
results.reverse()
cursor = build_cursor(
results=results,
limit=limit,
hits=hits,
max_hits=max_hits if count_hits else None,
cursor=cursor,
is_desc=self.desc,
key=self.get_item_key,
on_results=self.on_results,
)
if self.post_query_filter:
cursor.results = self.post_query_filter(cursor.results)
return cursor
def _calculate_start_offset(self, cursor):
"""Calculate the start offset for slicing, with support for negative offsets when advanced features are enabled."""
offset = cursor.offset
if self.enable_advanced_features and offset < 0:
# Advanced feature: Allow negative offsets for efficient reverse pagination
return offset
return max(0, offset) if not cursor.is_prev else offset
def get_result(self, limit=100, cursor=None, count_hits=False, known_hits=None, max_hits=None):
# Store original offset so we can restore it later
if cursor is not None:
original_offset = cursor.offset
# Temporarily adjust cursor.offset for the parent implementation
cursor.offset = self._calculate_start_offset(cursor)
result = super().get_result(limit, cursor, count_hits, known_hits, max_hits)
# Restore the original offset on the returned cursor
result.cursor.offset = original_offset
return result
return super().get_result(limit, cursor, count_hits, known_hits, max_hits)
🧰 Tools
🪛 Ruff (0.12.2)

852-855: Use ternary operator cursor_value = self.value_from_cursor(cursor) if cursor.value else 0 instead of if-else-block

Replace if-else-block with cursor_value = self.value_from_cursor(cursor) if cursor.value else 0

(SIM108)

🤖 Prompt for AI Agents
In src/sentry/api/paginator.py around lines 821 to 912, the get_result method in
OptimizedCursorPaginator duplicates most logic from BasePaginator.get_result
except for handling negative offsets when enable_advanced_features is True.
Refactor by extracting the offset calculation into a separate method that
returns the correct start and stop offsets based on the cursor and feature flag.
Then override get_result to call the parent's get_result with the adjusted
offset values, avoiding duplication and improving maintainability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants