A distributed, multi-threaded job scheduler built with Java 21 and Spring Boot 3. Submit jobs via REST API; workers claim and execute them in a thread pool, with retries, cancellation, and stale-job recovery. Ready for observability with Prometheus and Grafana.
| Feature | Description |
|---|---|
| REST API | Create jobs, get status, cancel pending jobs |
| Distributed workers | Poll-based claim with pessimistic locking; safe for multiple instances |
| Thread pool execution | Jobs run in a configurable ExecutorService so polling never blocks |
| Retry & failure | Configurable maxRetries; failed jobs get lastError and status FAILED |
| Stale recovery | RUNNING jobs older than a threshold are released back to PENDING |
| Observability | Actuator health, Prometheus metrics, custom counters (created, completed, failed, cancelled, etc.) |
| One-command run | Full stack (app + PostgreSQL + Redis + Prometheus + Grafana) via Docker Compose |
┌─────────────────┐
│ REST API │ POST/GET/PATCH cancel
└────────┬────────┘
│
▼
┌─────────────────┐
│ PostgreSQL │ job table (Flyway migrations)
└────────┬────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Worker │ │ Worker │ │ Worker │ poll → claim → execute (thread pool)
└───────────┘ └───────────┘ └───────────┘
│
▼
┌───────────┐ ┌───────────┐
│ Redis │ │ Prometheus│ (optional: coordination / metrics)
└───────────┘ └───────────┘
- Job store: PostgreSQL (single source of truth; status, payload, retries, timestamps).
- Workers: Same app or multiple instances; poll for PENDING jobs, claim with
SELECT ... FOR UPDATE, execute in a thread pool. - Handlers: Pluggable by job
type(e.g.LOG_MESSAGE); add new types by implementingJobHandlerand registering in the map.
- Docker and Docker Compose
- Or: JDK 21, Gradle 8.x, PostgreSQL 16, Redis 7
git clone https://github.com/NullPoint3rDev/distributed-job-scheduler.git
cd distributed-job-scheduler
docker compose up -d --buildThis starts:
| Service | Port | Description |
|---|---|---|
| Scheduler | 8080 | Application API & workers |
| PostgreSQL | 5432 | Job storage |
| Redis | 6379 | (available for extensions) |
| Prometheus | 9090 | Metrics |
| Grafana | 3000 | Dashboards (admin / admin) |
Wait ~30 seconds for the app to apply migrations, then:
# Health
curl -s http://localhost:8080/actuator/health | jq
# Create a job
curl -s -X POST http://localhost:8080/api/jobs \
-H "Content-Type: application/json" \
-d '{"type":"LOG_MESSAGE","payload":"Hello from Docker"}' | jq
# Get status (use jobId from above)
curl -s "http://localhost:8080/api/jobs/<jobId>" | jq- Start infrastructure:
docker compose up -d postgres redis(and optionally Prometheus). - Set in
application.ymlor env:spring.datasource.url=jdbc:postgresql://localhost:5432/scheduler,spring.data.redis.host=localhost. - Run:
./gradlew bootRun.
| Method | Path | Description |
|---|---|---|
POST |
/api/jobs |
Create a job (body: type, optional payload, scheduledAt, maxRetries). |
GET |
/api/jobs/{id} |
Get job by UUID or numeric id. |
PATCH |
/api/jobs/{id}/cancel |
Cancel a PENDING job (idempotent for terminal states). |
Create job (minimal):
POST /api/jobs
{"type": "LOG_MESSAGE", "payload": "Hello"}Response (201): {"jobId": "uuid", "status": "PENDING"}
Get job (200): Full job payload including status, createdAt, startedAt, finishedAt, retryCount, lastError, etc.
Cancel (200): Returns current job; if it was PENDING, status becomes CANCELLED. If already COMPLETED/FAILED/CANCELLED, returns current state. RUNNING returns 409 Conflict.
- Health:
GET /actuator/health - Prometheus:
GET /actuator/prometheus
Custom metrics (counter scheduler_jobs_total with tags):
event=created,cancelled,claimed,completed,failed,retry,stale_releasedtype= job type (e.g.LOG_MESSAGE)
In Grafana, add Prometheus with URL http://prometheus:9090 (from inside Docker) or http://localhost:9090 (local). Example query: scheduler_jobs_total.
./gradlew test- Unit tests (e.g.
JobService.cancelwith mocks). - Integration tests (API with H2 and disabled scheduling).
src/main/java/scheduler/
├── api/ # REST controller, DTOs, JobService (create, get, cancel)
├── config/ # Worker ID, ExecutorService, SchedulerMetrics
├── domain/ # Job entity, JobStatus
├── store/ # JobRepository (JPA)
├── services/ # JobClaimService, JobExecutionService, StaleJobRecoveryService, polling
└── worker/ # JobHandler interface, LogMessageHandler
This project is licensed under the MIT License — see the LICENSE file for details.