Skip to content

NullPoint3rDev/distributed-job-scheduler

Repository files navigation

Distributed Job Scheduler

Java Spring Boot PostgreSQL Redis Docker License: MIT

A distributed, multi-threaded job scheduler built with Java 21 and Spring Boot 3. Submit jobs via REST API; workers claim and execute them in a thread pool, with retries, cancellation, and stale-job recovery. Ready for observability with Prometheus and Grafana.


✨ Features

Feature Description
REST API Create jobs, get status, cancel pending jobs
Distributed workers Poll-based claim with pessimistic locking; safe for multiple instances
Thread pool execution Jobs run in a configurable ExecutorService so polling never blocks
Retry & failure Configurable maxRetries; failed jobs get lastError and status FAILED
Stale recovery RUNNING jobs older than a threshold are released back to PENDING
Observability Actuator health, Prometheus metrics, custom counters (created, completed, failed, cancelled, etc.)
One-command run Full stack (app + PostgreSQL + Redis + Prometheus + Grafana) via Docker Compose

🏗 Architecture

                    ┌─────────────────┐
                    │   REST API      │  POST/GET/PATCH cancel
                    └────────┬────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │   PostgreSQL    │  job table (Flyway migrations)
                    └────────┬────────┘
                             │
         ┌───────────────────┼───────────────────┐
         ▼                   ▼                   ▼
   ┌───────────┐       ┌───────────┐       ┌───────────┐
   │  Worker   │       │  Worker   │       │  Worker   │  poll → claim → execute (thread pool)
   └───────────┘       └───────────┘       └───────────┘
         │
         ▼
   ┌───────────┐       ┌───────────┐
   │   Redis   │       │ Prometheus│  (optional: coordination / metrics)
   └───────────┘       └───────────┘
  • Job store: PostgreSQL (single source of truth; status, payload, retries, timestamps).
  • Workers: Same app or multiple instances; poll for PENDING jobs, claim with SELECT ... FOR UPDATE, execute in a thread pool.
  • Handlers: Pluggable by job type (e.g. LOG_MESSAGE); add new types by implementing JobHandler and registering in the map.

🚀 Quick Start

Prerequisites

  • Docker and Docker Compose
  • Or: JDK 21, Gradle 8.x, PostgreSQL 16, Redis 7

Run everything with Docker (recommended)

git clone https://github.com/NullPoint3rDev/distributed-job-scheduler.git
cd distributed-job-scheduler
docker compose up -d --build

This starts:

Service Port Description
Scheduler 8080 Application API & workers
PostgreSQL 5432 Job storage
Redis 6379 (available for extensions)
Prometheus 9090 Metrics
Grafana 3000 Dashboards (admin / admin)

Wait ~30 seconds for the app to apply migrations, then:

# Health
curl -s http://localhost:8080/actuator/health | jq

# Create a job
curl -s -X POST http://localhost:8080/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"type":"LOG_MESSAGE","payload":"Hello from Docker"}' | jq

# Get status (use jobId from above)
curl -s "http://localhost:8080/api/jobs/<jobId>" | jq

Run locally (without Docker for the app)

  1. Start infrastructure: docker compose up -d postgres redis (and optionally Prometheus).
  2. Set in application.yml or env: spring.datasource.url=jdbc:postgresql://localhost:5432/scheduler, spring.data.redis.host=localhost.
  3. Run: ./gradlew bootRun.

📡 API

Method Path Description
POST /api/jobs Create a job (body: type, optional payload, scheduledAt, maxRetries).
GET /api/jobs/{id} Get job by UUID or numeric id.
PATCH /api/jobs/{id}/cancel Cancel a PENDING job (idempotent for terminal states).

Create job (minimal):

POST /api/jobs
{"type": "LOG_MESSAGE", "payload": "Hello"}

Response (201): {"jobId": "uuid", "status": "PENDING"}

Get job (200): Full job payload including status, createdAt, startedAt, finishedAt, retryCount, lastError, etc.

Cancel (200): Returns current job; if it was PENDING, status becomes CANCELLED. If already COMPLETED/FAILED/CANCELLED, returns current state. RUNNING returns 409 Conflict.


📊 Observability

  • Health: GET /actuator/health
  • Prometheus: GET /actuator/prometheus

Custom metrics (counter scheduler_jobs_total with tags):

  • event=created, cancelled, claimed, completed, failed, retry, stale_released
  • type = job type (e.g. LOG_MESSAGE)

In Grafana, add Prometheus with URL http://prometheus:9090 (from inside Docker) or http://localhost:9090 (local). Example query: scheduler_jobs_total.


🧪 Tests

./gradlew test
  • Unit tests (e.g. JobService.cancel with mocks).
  • Integration tests (API with H2 and disabled scheduling).

📁 Project structure

src/main/java/scheduler/
├── api/          # REST controller, DTOs, JobService (create, get, cancel)
├── config/       # Worker ID, ExecutorService, SchedulerMetrics
├── domain/       # Job entity, JobStatus
├── store/        # JobRepository (JPA)
├── services/     # JobClaimService, JobExecutionService, StaleJobRecoveryService, polling
└── worker/       # JobHandler interface, LogMessageHandler

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

About

Distributed job scheduler with Java 21 & Spring Boot 3. REST API, worker pool, retries, Prometheus metrics. Run with Docker Compose

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors