feat: resilient background job retry & monitoring (#130) by DrGalio · Pull Request #651 · rohitdash08/FinMind

DrGalio · 2026-03-26T10:35:50Z

Summary

Implements resilient background job retry & monitoring — closes #130.

Problem

The current reminder system had a critical reliability issue: run_due marked sent=True regardless of whether send_reminder actually succeeded. Failed deliveries were silently lost with no retry mechanism.

Solution

A production-ready background job runner with:

Retry with exponential backoff — configurable RetryPolicy (default: 3 retries, 5s base delay, 2x multiplier, 300s cap)
Dead-letter tracking — jobs that exhaust retries move to DEAD status for manual inspection
Job lifecycle model — BackgroundJob tracks attempts, errors, metadata through PENDING → RUNNING → COMPLETED / RETRYING / DEAD
Monitoring endpoints:
- GET /reminders/jobs/stats — aggregated counts by status
- GET /reminders/jobs/dead — dead-lettered jobs for inspection
- GET /reminders/jobs/retrying — jobs waiting for next retry
- POST /reminders/jobs/<id>/retry — manually re-queue dead jobs

Changes

File	Description
`services/job_runner.py` (new)	Core job runner with retry logic
`models.py`	Added `BackgroundJob` model and `JobStatus` enum
`routes/reminders.py`	Integrated job runner, fixed sent-boolean bug, added monitoring routes
`db/schema.sql`	Added `background_jobs` table with indexes
`tests/test_job_runner.py` (new)	11 tests covering retry policy, execution, monitoring

Tests

All 11 new tests pass:

Retry policy: exponential backoff, max delay cap
Enqueue: creates PENDING job, custom max_retries
Execution: success, retry on failure, dead after max retries, succeed after retries, no handler
Monitoring: stats, manual retry of dead jobs

Acceptance Criteria (from issue)

✅ Production ready implementation
✅ Includes tests
✅ Documentation updated (schema.sql)

Add a production-ready background job runner with retry logic, exponential backoff, dead-letter tracking, and monitoring endpoints. Changes: - New JobRunner service with configurable RetryPolicy and exponential backoff - BackgroundJob model tracking full job lifecycle (PENDING/RUNNING/COMPLETED/RETRYING/DEAD) - Fixed bug: reminder delivery now only marks sent=True on actual success - Failed deliveries auto-retry with exponential backoff (default 3 retries) - New monitoring endpoints: /jobs/stats, /jobs/dead, /jobs/retrying, /jobs/<id>/retry - Manual retry support for dead-lettered jobs - 11 new tests covering retry policy, execution, monitoring Closes rohitdash08#130

DrGalio requested a review from rohitdash08 as a code owner March 26, 2026 10:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: resilient background job retry & monitoring (#130)#651

feat: resilient background job retry & monitoring (#130)#651
DrGalio wants to merge 1 commit intorohitdash08:mainfrom
DrGalio:feat/resilient-background-job-retry

DrGalio commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DrGalio commented Mar 26, 2026

Summary

Problem

Solution

Changes

Tests

Acceptance Criteria (from issue)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant