A backend engineering project that incrementally builds a high-throughput, fault-aware notification processing system, evolving from in-memory queues to cloud-native distributed infrastructure.
The project is developed phase by phase to mirror how real production systems are designed, stressed, and hardened under load.
Notification systems (OTP, alerts, marketing messages) must handle:
- Sudden traffic spikes
- Slow or failing downstream providers
- Backpressure without crashing
- Eventual retries and dead-letter handling
Synchronous processing breaks under load. This project focuses on decoupling ingestion from processing and progressively introducing reliability and scale.
The system accepts notification requests and places them into a bounded in-memory queue, returning immediately to the client without blocking on processing.
Client
|
v
POST /notification
|
v
NotificationController
|
v
InMemoryNotificationQueue (BlockingQueue, bounded)
- Requests are accepted via REST
- Notifications are enqueued, not processed synchronously
- This prevents slow consumers from impacting API latency
LinkedBlockingQueuewith fixed capacity- When full, requests are explicitly rejected
- Prevents unbounded memory growth
This mimics how real systems behave under overload.
Request
{
"type": "EMAIL",
"recipient": "user@test.com",
"message": "Hello"
}Responses
202 Accepted→ notification queued429 Too Many Requests→ system overloaded (queue full)
A custom Java-based load simulator was used to stress the ingestion layer.
- Total requests: 10,000
- Max concurrent in-flight requests: 50
- Queue capacity: 1,000
- No consumer draining the queue
========== LOAD TEST RESULT ==========
Total Requests: 10000
Accepted (2xx): 1455
Rejected (429): 8545
Errors: 0
Time Taken(ms): 3398
Throughput (req/sec): ~2942
- Ingestion remained stable under burst traffic
- System rejected excess load gracefully
- No thread exhaustion or application crashes
- Backpressure behavior is visible and intentional
This validates the producer–consumer decoupling model.
- Producer–Consumer pattern
- Asynchronous request handling
- Explicit overload protection
- Stable ingestion under high traffic
Client
|
v
POST /notification
|
v
NotificationController
|
v
InMemoryNotificationQueue (bounded)
|
v
NotificationDispatcher (single thread)
|
v
WorkerPool (ThreadPoolExecutor, bounded)
|
v
Notification Processing
This architecture directly maps to:
- Kafka consumer group model
- SQS poller + ECS worker tasks
- RabbitMQ worker queues
- Requests are accepted via REST
- Notifications are enqueued, not processed synchronously
- API latency remains stable even if processing is slow
- Bounded queue prevents memory blow-up
- Queue full → HTTP
429 Too Many Requests - Overload is explicit, not silent
Naive approach (intentionally avoided):
new Thread(() -> {
while (true) {
process();
}
}).start();This is:
- Unbounded
- Impossible to tune
- Not observable
- Unsafe under load
ExecutorService workerPool = new ThreadPoolExecutor(
10, // core threads
20, // max threads
60, TimeUnit.SECONDS,
new ArrayBlockingQueue<>(500),
new ThreadPoolExecutor.CallerRunsPolicy()
);A single dispatcher thread pulls from the queue and submits tasks to the worker pool.
A custom Java-based load simulator was used to stress the system end-to-end at the ingestion layer.
- Total requests: 10,000
- Client concurrency: 50
- Bounded ingestion queue
- Dispatcher + bounded worker pool
- Processing delay simulated (~100 ms)
========== LOAD TEST RESULT ==========
Total Requests: 10000
Accepted (2xx): 1940
Rejected (429): 8060
Errors: 0
Time Taken(ms): 3620
Throughput (req/sec): 2762
In this phase, I simulated real-world downstream latency and failures to observe how the system behaves under stress.
- Introduced artificial latency (50–350ms) in the Email notification provider to mimic slow external services.
- Simulated random failures (~20%) during notification delivery.
- Kept the queue bounded and executor thread-limited to enforce backpressure.
- Ensured the API responds quickly with accept or reject without blocking on processing.
To validate that:
- Slow providers reduce effective processing capacity
- The system applies backpressure instead of failing
- Requests are rejected early when the system is overloaded
Total Requests: 10000
Accepted (2xx): 1359
Rejected (429): 8641
Errors: 0
Time Taken(ms): 2299
Throughput (req/sec): 4349
- High rejection rate confirms early backpressure under slow downstream conditions.
- Zero errors indicate system stability despite latency and failures.
- Higher throughput reflects fast rejection, not faster processing.
- Processing continues asynchronously after HTTP responses are returned.
This phase demonstrates how downstream latency directly impacts system capacity and why early rejection is critical for stability.
Enhance the notification processing pipeline with fault tolerance, controlled retries, and operational visibility while maintaining system stability under heavy load.
- Failed notification deliveries are retried up to a configured maximum retry limit
- Retries are re-enqueued instead of blocking worker threads
- Prevents infinite retry loops and retry storms during high failure rates
Observed Behavior
- Retry attempts increased gradually during load
- Retries were spread over time, avoiding bursts
- System remained stable even with intentional provider failures
- Notifications exceeding the retry limit are moved to a Dead Letter Queue
- DLQ isolates permanently failing notifications from the main pipeline
- DLQ consumer logs failed notification IDs for post-mortem analysis
Observed Behavior
- Only a small number of notifications reached the DLQ
- DLQ growth occurred strictly after retry exhaustion
- Main processing pipeline remained unaffected
- Fixed-size worker pool processes notifications concurrently
- Internal bounded queue absorbs bursts when workers are saturated
- Protects the system from thread exhaustion and overload
Observed Behavior
- Worker pool consistently hit max utilization under load
- Queue size increased temporarily and drained smoothly after load completion
- No thread leaks or deadlocks observed
A periodic system reporter logs real-time operational metrics:
- Active worker count
- Queue depth
- Completed task count
- Total retry attempts
- Successful deliveries
- Failed attempts
- DLQ size
These metrics provided a clear timeline view of system behavior during stress testing.
========== LOAD TEST RESULT ==========
Total Requests: 10000
Accepted (2xx): 1674
Rejected (429): 8326
Errors: 0
Time Taken(ms): 2444
Throughput (req/sec): 4091
Interpretation
- High rejection count confirms effective backpressure via bounded queues
- Zero errors indicate system stability under overload
- High throughput shows fast ingestion while processing remains controlled
- Retries and DLQ ensured failures were handled safely without meltdown
Phase-4 upgrades the system into a resilient, production-ready asynchronous pipeline capable of:
- Handling transient failures with delayed retries
- Isolating permanent failures using DLQ
- Maintaining stability under overload
- Providing actionable observability for operators
- Why retries must be bounded and delayed
- How DLQs protect core processing paths
- How backpressure prevents cascading failures
- How metrics reveal real system behavior under load