You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduce end-to-end idempotency and delivery guarantees across REST/GraphQL/gRPC + async pipelines (RabbitMQ/Kafka) to prevent duplicated charges/orders/expenses, message loss, or out-of-order effects. Implement the Transactional Outbox + Inbox patterns, idempotency keys on write APIs, retries with backoff, and dead-letter queues (DLQs)—with full OpenTelemetry traces and metrics.
Why
Current microservices (budgets, expenses, orders, notifications, tasks) span MongoDB, PostgreSQL, Redis, RabbitMQ, Kafka. Network glitches and retries can produce duplicates or lost messages.
Payment-like flows (orders/transactions) need “exactly-once effects” even when at-least-once delivery is used.
This aligns with existing architecture (queues, caches, multiple DBs) and hardens production behavior without changing business features.
Scope
HTTP/gRPC Idempotency Keys
Write endpoints (POST /api/orders, POST /api/expenses, POST /api/transactions) accept header Idempotency-Key (UUIDv4).
Server stores request hash + normalized response for the key; repeated calls return the same response (HTTP 200) without re-executing side effects.
Storage: Redis primary (fast), fallback Mongo collection idempotency_keys with TTL.
Transactional Outbox (producer side)
For services that publish events (orders/expenses/transactions), write domain changes + event record to the same DB transaction (Mongo: two-phase with session; Postgres: regular tx).
A background outbox dispatcher reads pending events, publishes to RabbitMQ/Kafka, marks them delivered with a monotonic sequence and publishes trace context.
Inbox/Consumer Idempotency (consumer side)
Consumers record processed message IDs (Kafka offset + message key / RabbitMQ messageId) into an inbox table (Redis set + DB table) to ensure exactly-once processing semantics.
Retry & Backoff
Standardize retry policy: exponential backoff with jitter (e.g., base 250ms, max 30s, cap 7 tries), then route to DLQ (dead-letter exchange / Kafka DLQ topic) with error reason.
DLQ Handling
Unified DLQ topics/queues per service with small CLI to replay single messages or batches after fixes: budget-manager dlq:peek --service expenses budget-manager dlq:replay --service expenses --since 15m
Ordering & Dedup
Prefer message keys (e.g., orderId) for Kafka topics to keep partition ordering.
For RabbitMQ, group by routing keys; consumers perform per-key serialization using a Redis lock (short TTL).
Logs include idempotencyKey, eventId, sagaId (when applicable).
Acceptance Criteria
✅ Replaying the same POST /api/orders with identical payload + Idempotency-Key returns the same response; no duplicate order/transaction rows; queue publishes only once.
✅ Service restarts do not lose pending events (outbox drains on boot) and do not re-apply already processed messages (inbox dedup).
✅ DLQs capture permanently failing messages; CLI can replay single or batched messages.
✅ Grafana dashboard shows the above metrics; traces show end-to-end spans across producers/consumers.
✅ Load test proves zero duplicates across 10k idempotent writes with induced failures (network cuts, consumer crashes).
Summary
Introduce end-to-end idempotency and delivery guarantees across REST/GraphQL/gRPC + async pipelines (RabbitMQ/Kafka) to prevent duplicated charges/orders/expenses, message loss, or out-of-order effects. Implement the Transactional Outbox + Inbox patterns, idempotency keys on write APIs, retries with backoff, and dead-letter queues (DLQs)—with full OpenTelemetry traces and metrics.
Why
Scope
POST /api/orders,POST /api/expenses,POST /api/transactions) accept headerIdempotency-Key(UUIDv4).idempotency_keyswith TTL.budget-manager dlq:peek --service expensesbudget-manager dlq:replay --service expenses --since 15morderId) for Kafka topics to keep partition ordering.outbox_pending,outbox_published_total,consumer_processed_total,consumer_dedup_hits_total,dlq_messages_total,idempotency_cache_hits_total.idempotencyKey,eventId,sagaId(when applicable).Acceptance Criteria
POST /api/orderswith identical payload +Idempotency-Keyreturns the same response; no duplicate order/transaction rows; queue publishes only once.Design Notes
Where to put outbox/inbox:
outbox_events,inbox_messages.outbox_events,inbox_messages.Schemas (sketch):
CLI additions: extend
cli.jswithdlq:peek,dlq:replay,outbox:stats.Tasks
HTTP layer: Add
Idempotency-Keymiddleware (REST & GraphQL mutations); Redis first, Mongo fallback.Normalize requests (stable JSON stringify, redact secrets) to generate
requestHash.Outbox write path: Wrap write endpoints to append outbox event in same tx/session.
Outbox dispatcher: new worker (pm2/k8s CronJob) with backpressure & batch publishes.
Consumers: add inbox check + record; ensure handler is side-effect safe on re-delivery.
Retry policy & DLQ for both RabbitMQ (DLX) and Kafka (DLQ topic).
Tracing: add OpenTelemetry SDK, instrument producers/consumers, propagate headers.
Metrics: expose Prometheus counters/gauges; dashboards JSON in
/docs/observability/.CLI:
dlq:peek,dlq:replay,outbox:stats.Tests:
Docs:
/docs/reliability.mdcovering patterns, env vars, runbooks.Risks & Mitigations
Assignee: @hoangsonww
Milestone:
v1.2.0Related: CI/CD, Kafka/RabbitMQ configs, Prometheus/Grafana setup