Evora - Job Queue

Evora is a robust distributed job queue system built on top of Postgres with exactly-once idempotency guarantees and scalable worker management.

System Architecture

graph LR
    subgraph Clients
        App([Producer App])
        Dash([Admin Dashboard])
    end

    subgraph Evora Server
        API[HTTP API]
        Dispatcher[Worker Dispatcher]
        Lifecycle[Lifecycle Manager]
        Sweeper[Visibility Sweeper]
        Projector[Job Projector]
    end

    subgraph Storage
        PG[(PostgreSQL<br>jobs & events)]
        Mongo[(MongoDB<br>telemetry)]
    end

    subgraph External Workers
        W1[Worker 1]
        W2[Worker 2]
    end

    %% Producer Flow
    App -->|POST /jobs| API
    API -->|Insert PENDING| PG

    %% Worker Flow
    W1 -->|GET /jobs/poll| API
    W2 -->|POST /jobs/:id/complete| API
    API --> Dispatcher
    API --> Lifecycle
    
    Dispatcher -->|FOR UPDATE SKIP LOCKED| PG
    Lifecycle -->|Update Status| PG

    %% Background Tasks
    Sweeper -.->|Find Expired Locks| PG
    
    %% Telemetry
    Lifecycle -.->|Publish Event| Projector
    Projector -->|Upsert Stats| Mongo
    Dash -->|GET /queues/stats| API
    API -->|Read| Mongo

Job Lifecycle (State Machine)

This diagram shows how a job transitions through its states within Evora.

stateDiagram-v2
    [*] --> PENDING : Submitted by Client
    
    PENDING --> RUNNING : Claimed by Worker
    
    RUNNING --> COMPLETED : Worker finishes successfully
    RUNNING --> PENDING : Worker fails (Attempt < Max)
    RUNNING --> PENDING : Lock expires (Visibility Sweeper)
    RUNNING --> DLQ : Worker fails (Attempt >= Max)
    RUNNING --> DLQ : Lock expires (Attempt >= Max)
    
    DLQ --> PENDING : Manual Retry from Dashboard
    
    COMPLETED --> [*]
    DLQ --> [*]

Queue Processing Sequence

This sequence diagram illustrates the lifecycle of a single job, including idempotency checks and worker processing.

sequenceDiagram
    participant Client
    participant API as Evora API
    participant DB as Postgres DB
    participant Worker

    Client->>API: POST /jobs (idempotency_key)
    API->>DB: SELECT * WHERE idempotency_key
    alt Job Exists
        DB-->>API: Return existing Job
        API-->>Client: 200 OK (Already Exists)
    else Job Does Not Exist
        API->>DB: INSERT INTO jobs (status=PENDING)
        API-->>Client: 201 Created
    end

    loop Every 10ms
        Worker->>DB: UPDATE jobs SET status=RUNNING... FOR UPDATE SKIP LOCKED
        DB-->>Worker: Returns 1 Claimed Job
    end

    Worker->>Worker: Process Task (Long running)
    
    loop Every 20 seconds
        Worker->>API: POST /jobs/:id/heartbeat
        API->>DB: UPDATE locked_until = NOW() + 30s
    end

    Worker->>API: POST /jobs/:id/complete
    API->>DB: UPDATE jobs SET status=COMPLETED

Exactly-Once Idempotency & Locking

Our core mechanic for safely claiming jobs across multiple concurrent workers without external queues (like RabbitMQ or Redis) relies on Postgres's native FOR UPDATE SKIP LOCKED.

UPDATE jobs
SET status        = 'RUNNING',
    worker_id     = ?,
    locked_until  = NOW() + (? || ' seconds')::INTERVAL,
    attempt_count = attempt_count + 1
WHERE id = (
    SELECT id FROM jobs
    WHERE status       = 'PENDING'
      AND queue        = ?
      AND scheduled_at <= NOW()
    ORDER BY priority ASC, scheduled_at ASC
    FOR UPDATE SKIP LOCKED
    LIMIT 1
)
RETURNING *

Why this guarantees exactly-once processing:

Idempotent Submission: When a job is submitted, we first check the idempotency_key. If it exists, we return the existing job. This prevents the same operation from being queued multiple times.
Atomic Claiming: FOR UPDATE SKIP LOCKED allows multiple workers to poll the table simultaneously. Instead of blocking each other waiting for a row lock, they instantly skip over rows already locked by other workers and grab the next available pending job.
Visibility Timeouts: Rather than assuming a worker completed the job, we give them a "lease" (locked_until). If the worker crashes before completion, the lease expires. The VisibilityTimeoutSweeper will then requeue the job automatically.

The Visibility Timeout Flow

A worker polls and claims a job. The job status becomes RUNNING and a locked_until timestamp is set (e.g., 30 seconds into the future).
The worker processes the job. If processing is slow, the worker periodically sends a heartbeat to extend the locked_until timestamp.
If the worker crashes or hangs, it stops sending heartbeats.
The VisibilityTimeoutSweeper runs periodically (e.g., every 10 seconds), finding jobs where status = 'RUNNING' and locked_until < NOW().
The sweeper resets the job to PENDING (requeuing it) and increments the attempt count. If the maximum attempt count is reached, it moves the job to the Dead Letter Queue (DLQ).

Key Design Decisions

Postgres as a Queue: While dedicated message brokers are standard, using Postgres simplifies our operational overhead. With SKIP LOCKED, Postgres is highly capable of acting as a high-throughput queue while giving us transactional guarantees across business data and queue state.
Pull vs Push: Workers actively poll the queue instead of the system pushing to them. This inherently provides backpressure—workers only take what they can handle, preventing overwhelming down-stream systems during traffic spikes.
Eventual Consistency Telemetry: We separate operational queuing (Postgres) from telemetry (MongoDB). Domain events like JobCompletedEvent are published and projected into Mongo. This offloads expensive aggregation queries from our core Postgres database, ensuring queue latency stays minimal.

Setup & Running

Start the supporting infrastructure:

docker-compose up -d

Run the application:

mvn clean compile exec:java -Dexec.mainClass="com.evora.EvoraApplication"

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.vscode		.vscode
scratch		scratch
src/main		src/main
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
evora-event-store.json		evora-event-store.json
launch-evora.ps1		launch-evora.ps1
launch-evora.sh		launch-evora.sh
pom.xml		pom.xml
run.ps1		run.ps1
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evora - Job Queue

System Architecture

Job Lifecycle (State Machine)

Queue Processing Sequence

Exactly-Once Idempotency & Locking

The Visibility Timeout Flow

Key Design Decisions

Setup & Running

Dashboards

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Evora - Job Queue

System Architecture

Job Lifecycle (State Machine)

Queue Processing Sequence

Exactly-Once Idempotency & Locking

The Visibility Timeout Flow

Key Design Decisions

Setup & Running

Dashboards

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages