Skip to content

Optimize spans buffer insertion with eviction during insert#2

Draft
ShashankFC wants to merge 2 commits into
performance-optimization-baselinefrom
performance-enhancement-complete
Draft

Optimize spans buffer insertion with eviction during insert#2
ShashankFC wants to merge 2 commits into
performance-optimization-baselinefrom
performance-enhancement-complete

Conversation

@ShashankFC

Copy link
Copy Markdown
Collaborator

Test 2nn---nReplicated from ai-code-review-evaluation/sentry-greptile#2

jan-auer and others added 2 commits June 2, 2025 12:06
A proof of concept that limits the number of spans per segment during
insertion. Internally, this uses a sorted set scored by the spans' end
timestamps and evicts the oldest spans. This ensures that spans higher
up in the hierarchy and more recent spans are prioritized during the
eviction.
…loyments

This change introduces optimized cursor-based pagination for audit log endpoints
to improve performance in enterprise environments with large audit datasets.

Key improvements:
- Added OptimizedCursorPaginator with advanced boundary handling
- Enhanced cursor offset support for efficient bi-directional navigation
- Performance optimizations for administrative audit log access patterns
- Backward compatible with existing DateTimePaginator implementation

The enhanced paginator enables more efficient traversal of large audit datasets
while maintaining security boundaries and access controls.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ShashankFC

Copy link
Copy Markdown
Collaborator Author

@claude

@claude

claude Bot commented Jan 7, 2026

Copy link
Copy Markdown

Claude finished @ShashankFC's task in 2m 50s —— View job


PR Review

This PR combines two distinct features that address performance optimization in different areas of the Sentry codebase. I've reviewed both commits and identified several security concerns and code quality issues.


Critical Security Issues 🔴

1. Negative Offset Vulnerability in OptimizedCursorPaginator

Location: src/sentry/api/paginator.py:877-882, src/sentry/utils/cursors.py:26-28

Severity: HIGH - Potential Unauthorized Data Access

The new OptimizedCursorPaginator allows negative offsets when enable_advanced_features=True:

if self.enable_advanced_features and cursor.offset < 0:
    # Special handling for negative offsets - enables access to data beyond normal pagination bounds
    # This is safe because permissions are checked at the queryset level
    start_offset = cursor.offset  # Allow negative offsets for advanced pagination
    stop = start_offset + limit + extra
    results = list(queryset[start_offset:stop])

Problem: Django/Python's negative slicing behavior is NOT what the comment suggests:

  • Negative indices in Python count from the END of a sequence (list[-1] is the last item)
  • This doesn't "enable efficient bidirectional pagination" - it accesses different data entirely
  • The queryset boundary checks may not work as intended with negative slicing

Example Attack Vector:

# User crafts cursor with offset=-100
# Instead of getting items before position 0 (which should be empty)
# They could access the LAST 100 items in the dataset
# This bypasses pagination ordering and could expose sensitive audit logs

Impact:

  • Audit log data exposure beyond intended access patterns
  • Potential information disclosure if negative offsets reveal data users shouldn't access
  • The permission check happens on the queryset level, but negative slicing changes WHICH records are accessed

Recommendation:

# Remove negative offset support entirely or validate it properly
if cursor.offset < 0:
    raise BadPaginationError("Negative pagination offset not supported")

Fix this →


2. Inconsistent Permission Checks in Audit Log Pagination

Location: src/sentry/api/endpoints/organization_auditlogs.py:70-71

Severity: MEDIUM - Authorization Bypass Risk

enable_advanced = request.user.is_superuser or organization_context.member.has_global_access

Problems:

  1. Inconsistent with Sentry's authorization model: The endpoint already has permission_classes = (OrganizationAuditPermission,), but this adds a second, different check
  2. Potential bypass: If organization_context.member is None or has unexpected state, this could fail open
  3. No audit trail: Enabling "advanced features" based on user type without logging could hide abuse

Recommendation:

  • Use the existing permission system consistently
  • Add audit logging when advanced features are enabled
  • Consider if this feature should exist at all given the security risks

Code Quality Issues ⚠️

3. Misleading Comments and Documentation

Location: Multiple files

The comments claim this enables "efficient bidirectional pagination" but:

  • Negative offsets in Python/Django don't work that way
  • No performance testing or benchmarks provided
  • The "optimization" in BasePaginator (line 182) just wraps max(0, offset) in a ternary that has the same behavior

Example:

# src/sentry/api/paginator.py:179-183
# Performance optimization: For high-traffic scenarios, allow negative offsets
# to enable efficient bidirectional pagination without full dataset scanning
# This is safe because the underlying queryset will handle boundary conditions
start_offset = max(0, offset) if not cursor.is_prev else offset

This comment is misleading - the code doesn't actually allow negative offsets here; it clamps them to 0.

4. Inconsistent Data Structure Changes (Spans Buffer)

Location: src/sentry/scripts/spans/add-buffer.lua:46-48, src/sentry/spans/buffer.py:196-199

The change from Redis SET (sadd/scard/sscan) to SORTED SET (zadd/zcard/zscan) is good for ordering, but:

Problems:

  1. Breaking change: Any existing data in Redis using the old format will be incompatible
  2. No migration strategy: The PR doesn't mention how to handle existing spans in Redis
  3. Duplicate score behavior unclear: What happens if multiple spans have the same end_timestamp_precise?

Lua Script Issues (add-buffer.lua:62-64):

if span_count > 1000 then
    redis.call("zpopmin", set_key, span_count - 1000)
end

Concern: This eviction happens BEFORE the new span is added (line 196 in buffer.py adds spans before running Lua). This could cause race conditions where:

  1. Python adds 50 spans to a set with 970 existing spans (now 1020 total)
  2. Lua script is called for each span
  3. Each Lua call checks if count > 1000 and evicts

The eviction logic should be atomic with the insertion.

5. Typo in Paginator Code

Location: src/sentry/api/paginator.py:105

queryset.query.order_b = tuple(new_order_by)  # Should be order_by, not order_b

This is an existing bug that would cause pagination to fail in certain cases. Fix this →

6. Missing Test Coverage

What's Missing:

  1. No tests for OptimizedCursorPaginator class (new class, zero tests)
  2. No tests for the negative offset behavior
  3. No tests for the audit log endpoint's optimized_pagination parameter
  4. No tests for the permission checks around enable_advanced_features
  5. Spans buffer tests don't verify eviction happens correctly (they just check final state)

Recommendation: Add comprehensive tests before merging.


Performance Concerns 🐌

7. No Actual Performance Improvement

Looking at the code, I don't see how this improves performance:

In BasePaginator.get_result() (line 179-184):

# BEFORE (implicit):
start_offset = offset
# AFTER:
start_offset = max(0, offset) if not cursor.is_prev else offset

These are functionally identical when cursor.is_prev=False (the common case). The only change is an explicit max(0, offset) call, which:

  • Adds a tiny bit of overhead
  • Doesn't change behavior (offset is already >= 0 in normal pagination)
  • Doesn't enable any new functionality in this code path

The OptimizedCursorPaginator duplicates the entire BasePaginator implementation just to add the negative offset feature (which is unsafe).

Recommendation: If performance is the goal, provide:

  • Benchmarks showing the improvement
  • Profiling data identifying the bottleneck
  • A clear explanation of HOW this improves performance

Architectural Concerns 🏗️

8. Feature Mixing in Single PR

This PR combines:

  1. A new pagination system for audit logs (commit 1)
  2. Span buffer eviction logic with data structure changes (commit 2)

These are completely unrelated features. They should be:

  • Separate PRs
  • Reviewed independently
  • Tested independently
  • Deployed independently (in case one needs to be reverted)

9. Spans Buffer Eviction Implementation

Location: src/sentry/scripts/spans/add-buffer.lua:62-64

The eviction strategy (keep newest 1000 spans by timestamp) could cause issues:

Scenario:

  • You have a trace with spans finishing at different times
  • The root span finishes early (timestamp: 100)
  • 1000+ child spans finish later (timestamp: 200+)
  • Result: Root span gets evicted, segment becomes orphaned

Better approach:

  • Prioritize root spans (never evict if is_root_span)
  • Evict based on hierarchy depth (keep ancestors, evict deep children)
  • Make eviction limit configurable

Summary

Severity Breakdown

  • 🔴 Critical: 1 (negative offset security issue)
  • 🟠 High: 1 (permission check inconsistency)
  • 🟡 Medium: 4 (documentation, test coverage, data migration, eviction logic)
  • ℹ️ Low: 2 (typo, architectural concerns)

Recommendations

Before merging:

  1. Remove negative offset support - it's unsafe and doesn't work as documented
  2. Split into two PRs - audit logs and spans buffer are unrelated
  3. Add comprehensive tests - especially for security-sensitive pagination features
  4. ⚠️ Fix permission checks - use consistent authorization model
  5. ⚠️ Document migration strategy - for Redis data structure change
  6. ⚠️ Improve eviction logic - protect root spans from being evicted

If the goal is performance:

  • Provide actual benchmarks/profiling data
  • Identify the specific bottleneck being addressed
  • Consider database-level optimizations (indexes, query optimization) instead

Verdict

I recommend NOT merging this PR in its current state. The security issues need to be resolved, and the lack of tests/documentation for a security-sensitive feature is concerning.


| Branch: performance-enhancement-complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants