feat(uptime): Add ability to use queues to manage parallelism#9
feat(uptime): Add ability to use queues to manage parallelism#9everettbu wants to merge 1 commit into
Conversation
One potential problem we have with batch processing is that any one slow item will clog up the whole batch. This pr implements a queueing method instead, where we keep N queues that each have their own workers. There's still a chance of individual items backlogging a queue, but we can try increased concurrency here to reduce the chances of that happening <!-- Describe your PR here. -->
There was a problem hiding this comment.
Greptile Summary
This PR introduces a new thread-queue-parallel processing mode for Kafka consumers in Sentry's uptime monitoring system. The implementation adds sophisticated queue-based parallelism that maintains ordering guarantees within subscription groups while enabling concurrent processing across different groups.
The core addition is the queue_consumer.py module, which implements several key components:
- OffsetTracker: Manages safe offset commits with gap handling to prevent message loss
- OrderedQueueWorker: Thread-based workers that process messages from specific queues
- FixedQueuePool: Distributes work across fixed queues using consistent hashing
- SimpleQueueProcessingStrategy: Main processing strategy that integrates with Arroyo framework
The system uses consistent hashing to route messages to specific worker queues, ensuring all messages for the same subscription group are processed in order while allowing parallelism across different groups. This addresses limitations of existing modes: serial processing is too slow, multiprocessing has overhead, and batched-parallel still processes in batches.
The integration involves:
- Adding the new mode to
ResultsStrategyFactorywith earlyresult_processorinitialization - Updating CLI options in
consumers/__init__.pyto include the new mode - Comprehensive test coverage including unit tests and Kafka integration tests
- Clean-up of test module
__init__.pyfiles
This architecture is particularly valuable for uptime monitoring where order matters within individual subscriptions but different subscriptions can be processed independently, providing natural backpressure and load distribution.
Confidence score: 3/5
- This PR introduces significant concurrency complexity that could lead to subtle race conditions or deadlocks in production
- The offset tracking logic in
OffsetTracker.commit_offset()has complex gap handling that could potentially commit incorrect offsets under edge cases - The thread-safe partition lock creation using
setdefaultpattern and exception handling forqueue.ShutDownin worker threads need careful review for correctness - Files needing more attention:
src/sentry/remote_subscriptions/consumers/queue_consumer.py(lines 67-98 for offset commit logic, lines 49-54 for thread-safe operations),src/sentry/remote_subscriptions/consumers/result_consumer.py(early processor initialization and queue pool management)
7 files reviewed, 2 comments
| except queue.ShutDown: | ||
| break |
There was a problem hiding this comment.
logic: Exception handling assumes queue.ShutDown exists, but standard library queue module doesn't have this exception
|
|
||
| for q in self.queues: | ||
| try: | ||
| q.shutdown(immediate=False) |
There was a problem hiding this comment.
logic: Standard library queue.Queue doesn't have a shutdown() method with immediate parameter
Review Summary🏷️ Draft Comments (8)
|
Test 9