OpenDST is a Java library designed to enable Deterministic Simulation Testing (DST) for distributed systems. It allows developers to test complex, concurrent, and distributed logic in a completely deterministic environment, making flaky tests a thing of the past.
By intercepting non-deterministic operations (like time, threading, and randomness) and replacing them with a controlled simulation, OpenDST ensures that every test run produces the exact same result for a given seed.
- Deterministic Execution: Eliminates flaky tests by controlling all sources of non-determinism.
- Virtual Time: Simulates time progression instantly.
Thread.sleep(Duration.ofHours(1))completes in milliseconds. - Deterministic Scheduling: Uses a custom scheduler (leveraging Java Virtual Threads) to control thread execution order.
- Controlled Randomness: Provides a deterministic source of randomness that can be seeded for reproducibility.
- Bytecode Instrumentation: Automatically intercepts JDK calls (e.g.,
System.currentTimeMillis(),new Thread(),SecureRandom) using a Java Agent, so you don't need to change your production code.
OpenDST uses two instrumentation strategies. Application code is instrumented offline by the Maven plugin using the JDK 25 ClassFile API (JEP 484), which rewrites SDK stub call-sites at build time. JDK internals are handled at runtime by a lightweight Java Agent (SimulatorAgent) that intercepts non-deterministic APIs and redirects them to the Simulator.
When running inside a simulation:
- Time:
System.currentTimeMillis()andSystem.nanoTime()return a simulated time that advances only when the simulator decides. - Threads:
new Thread()andstartVirtualThread()are intercepted to run as virtual threads managed by the simulator's scheduler. - Randomness:
ThreadLocalRandom,SecureRandom, andRandomare seeded deterministically. - Network: Network interactions are simulated with a virtual TCP stack, with configurable latency, connection clogging, connection resets, and timeouts. See Fault Injection for the full list.
OpenDST aims for strict determinism, but some JDK APIs are not yet isolated or fully instrumented. For an exhaustive list of known gaps (including File I/O and certain system properties), see KNOWN_GAPS.md.
When network fault injection is enabled (the default), the simulator injects the following faults into the simulated network stack. All faults are deterministic (driven by a seeded RNG) and gated on Signals.ready() — no faults fire during application startup.
| Fault | Default | Description |
|---|---|---|
| Bimodal latency | 99.9% fast (100µs–800µs), 0.1% slow (up to 100ms) | Every send/receive incurs a half-round-trip delay drawn from a bimodal distribution |
| Connection clogging | Random per connection pair, up to 100ms | Each (source, destination) address pair is assigned a fixed random additional latency on first connection |
| Bind failure | 5% when SO_REUSEADDR is off |
bind() throws BindException to simulate address-in-use races |
| Connection reset | 0.1% per send or receive | Throws SocketException("Connection reset") |
| Timeout | 0.1% per send or receive | Injects a 10-second delay then throws SocketTimeoutException |
| Partial receive | 95% of reads | Receiver gets a random subset of available bytes instead of all of them |
| Variable send buffer | Random per connection | Send buffer size varies between connections, creating back-pressure |
| Sender transit delay | 0–2ms per write | Random delay between a write completing and the data arriving at the receiver |
| Fault | Default | Description |
|---|---|---|
| Thread scheduling jitter | 1–10,000 ns | Random delay on virtual thread wake-up to explore different interleaving orders |
These are not faults per se, but JVM behaviors that OpenDST overrides to ensure determinism:
- Identity hash codes:
-XX:hashCode=2for deterministicSystem.identityHashCode() - Collection salt:
ImmutableCollections.SALT32LandREVERSEoverridden per node - RNG:
ThreadLocalRandom,SecureRandom, andRandomseeded deterministically - Time:
System.currentTimeMillis(),System.nanoTime(),Instant.now()return simulated virtual time - Threads: All
Threadconstructors redirected to virtual threads on the simulator's scheduler - GC:
ReferenceQueue.poll()returnsnull(deterministic GC behavior) - Process exit:
Runtime.exit()throwsSystemExitErrorinstead of halting the JVM
- Java 25 or later.
- Maven.
The Maven plugin's build goal (bound to the package phase) instruments your code offline and packages everything into a self-contained -opendst.jar. To run a simulation, execute the JAR directly:
mvn clean package
java -jar target/*-opendst.jarEach service is a class with a public static void main(String[]) entry point. Use the opendst-sdk for assertions and lifecycle signals:
import com.pingidentity.opendst.sdk.Assert;
import com.pingidentity.opendst.sdk.Signals;
import java.io.*;
import java.net.*;
public class EchoApp {
public static class Server {
public static void main(String[] args) throws Exception {
var port = Integer.parseInt(args[0]);
try (var ss = new ServerSocket(port);
var socket = ss.accept();
var in = new DataInputStream(socket.getInputStream());
var out = new DataOutputStream(socket.getOutputStream())) {
Signals.ready();
int value = in.readInt();
Assert.reachable("server-received", null);
out.writeInt(value + 1);
}
}
}
public static class Client {
public static void main(String[] args) throws Exception {
var host = args[0];
var port = Integer.parseInt(args[1]);
Signals.ready();
try (var socket = new Socket(host, port);
var out = new DataOutputStream(socket.getOutputStream());
var in = new DataInputStream(socket.getInputStream())) {
out.writeInt(42);
int response = in.readInt();
Assert.always(response == 43, "echo-correct", null);
}
}
}
}Describe the deployment topology in a deployment.yaml:
services:
server:
class: com.example.EchoApp$Server
ip: 10.0.0.1
args: ["8080"]
client:
class: com.example.EchoApp$Client
ip: 10.0.0.2
args: ["10.0.0.1", "8080"]Configure the Maven plugin with the build goal:
<plugin>
<groupId>com.pingidentity.opendst</groupId>
<artifactId>opendst-maven-plugin</artifactId>
<version>0.1.0-SNAPSHOT</version>
<executions>
<execution>
<goals><goal>build</goal></goals>
</execution>
</executions>
</plugin>Run the simulation from the produced JAR. All orchestration parameters are CLI arguments:
java -jar target/*-opendst.jar \
--stagnation-limit 100 \
--duration 100000 \
--branch-probability 0.7 \
--replay-probability 0.05 \
--working-dir target/opendst-workAvailable CLI options:
| Option | Default | Description |
|---|---|---|
--duration |
100000 | Maximum number of simulation steps per execution |
--stagnation-limit |
100 | Stop after N executions without new coverage |
--branch-probability |
0.7 | Probability of branching to explore a new path |
--replay-probability |
0.05 | Probability of replaying a previous trace |
--fork-count |
max(1, CPUs/2 - 1) | Number of concurrent simulation forks. Supports C suffix (e.g. 1C, 0.5C) |
--working-dir |
(JAR name sans .jar) |
Persistent working directory for deployment, runs, and reports |
--stop |
(none) | Early-stopping conditions (combinable): first-fail (stop on first assertion failure), first-pass (stop when all assertions pass after stagnation-limit runs). Omit for default behavior (run until stagnation) |
--plan |
(none) | Replay a saved plan file instead of exploring |
--extra-jvm-args |
(none) | Additional JVM arguments appended to build-time defaults |
The working directory has the following structure:
myapp-opendst/ # --working-dir (default: JAR path minus .jar)
deployment/ # extracted from JAR (skipped if already present)
runs/ # ephemeral per-fork directories
report/ # simulation output (persists across runs)
report.json
plans/ # execution plans and simulator logs
Simulator: The core engine that manages the simulation loop, virtual time, and task scheduling.SimulatorAgent: A Java Agent that uses the ClassFile API (JEP 484) to intercept JDK methods and redirect them to theSimulator.Node: Represents a node in the distributed system simulation (context for the current execution).
To learn more about Deterministic Simulation Testing and why it is a game-changer for distributed systems reliability, check out these resources:
- Antithesis: Deterministic Simulation Testing: A deep dive into determinism and its role in testing.
- Testing Distributed Systems with Deterministic Simulation (Will Wilson, FoundationDB): The seminal talk that popularized the technique.
- FoundationDB Testing: Detailed documentation on how FoundationDB uses simulation.
- TigerBeetle: Simulation: How TigerBeetle uses deterministic simulation to ensure correctness.
- Rewriting the heart of our sync engine: How Dropbox used deterministic simulation to rewrite their synchronization engine.
This project is licensed under the Apache License, Version 2.0.