Skip to content

PaulHosek/Market_Data_Dissemination_Simulator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

121 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Market Data Dissemination

CMake CI

This project implements a high-throughput, low-latency market data dissemination pipeline in C++23. I also added a benchmarking framework to evaluate the performance of different Single-Producer Single-Consumer (SPSC) lock-free queue architectures, wait strategies, and network transport protocols under heavy load.


Part 1: Performance Analysis

Try out the streamlit app:

streamlit run app.py

Note: All analysis done on a WSL2 instance on my Ryzen 7 9800x3d.

End-to-end latency

Lock-Free Queue: Custom vs. Boost

Comparison of the internal std::atomic ring buffer against boost::lockfree::spsc_queue, evaluating median latency and tail behavior (p99) under a busy-spin wait strategy.

Network Protocol: UDP Multicast vs. ZeroMQ TCP

Evaluation of network-layer overhead, comparing the latency distributions of raw UDP Multicast against ZeroMQ over TCP.

Maximum Throughput and Backpressure

Throughput stress testing up to 3,000,000 messages per second to observe OS socket buffer overflow (packet loss) in UDP versus TCP window backpressure mechanisms in ZeroMQ. Note: As you can see there is still some work to be done here in investigating what causes the large proportion of packet drops.

---

Part 2: System Architecture and Usage

Architecture Overview

The system models a standard exchange or proprietary trading feed architecture:

  1. Generator: Produces simulated Quote and Trade messages via a random walk model. Utilizes busy-wait interval timers to bypass OS sleep granularity limitations.
  2. Lock-Free Queue: Bridges the producer (generator) and consumer (disseminator) threads. Configurable with Spin (busy-wait) or Waitable (condition variable) strategies.
  3. Disseminator: Serializes messages and writes them to the network socket.
  4. Feed Handler: Ingests network data, applies symbol-based filtering, and captures nanosecond-precision receipt timestamps.
  5. Latency Monitor: A zero-allocation tracking component that aggregates internal software overhead (queue_ns) and network stack overhead (network_ns).

Project Structure

├── python/                 # Analytical suite and plotting scripts
├── src/
│   ├── disseminator/       # Network publishers (UDP, ZMQ)
│   ├── feedhandler/        # Network subscribers and filter logic
│   ├── generator/          # Market data simulation
│   ├── monitor/            # Latency telemetry collection
│   ├── utils/              # SPSC queues, types, and configurations
│   └── main.cpp            # Application entry point and CLI router
├── tests/                  # GTest unit and integration tests
└── CMakeLists.txt

Requirements

  • Compiler supporting C++23 (GCC, Clang, or MSVC)
  • CMake 3.28 or higher
  • vcpkg (for C++ dependency management)
  • Python 3.8+ (for the analytical suite)

C++ Dependencies (managed via vcpkg):

  • boost-lockfree
  • spdlog
  • cppzmq
  • zeromq
  • cxxopts
  • gtest

Building from Source

The project relies on CMake and vcpkg. To build the executable and tests from the command line:

mkdir build && cd build

cmake .. -DCMAKE_TOOLCHAIN_FILE=[path_to_vcpkg]/scripts/buildsystems/vcpkg.cmake -DCMAKE_BUILD_TYPE=Release

cmake --build . --config Release

Running the C++ Benchmark

The compiled binary main_simulate accepts command-line arguments to dictate the pipeline configuration. Queue sizes are resolved at compile-time via template dispatching to ensure zero runtime overhead in the hot path.

./main_simulate --queue spin --size 4096 --transport zmq --rate 100000 --duration 10 --symbols ../data/tickers.txt --out ../data

Available Options:

  • -q, --queue: Wait strategy (spin or waitable)
  • -u, --underlying: Queue implementation (custom or boost)
  • -s, --size: Queue capacity (128, 512, 1024, 4096, 16384, 65536)
  • -t, --transport: Network protocol (udp or zmq)
  • -r, --rate: Target message rate in messages per second
  • -d, --duration: Benchmark duration in seconds
  • -f, --symbols: Path to the subscription symbols list
  • -o, --out: Output directory for the resulting CSV files

Running the Analytical Suite

The Python scripts process the CSV outputs generated by the C++ backend and render statistical distributions.

cd python
pip install -r requirements.txt 

streamlit run plot_latency.py
python3 [choose_any].py

About

This project implements a high-throughput, low-latency market data dissemination pipeline in C++23. It serves as a benchmarking framework to evaluate the performance of different Single-Producer Single-Consumer (SPSC) lock-free queue architectures, wait strategies, and network transport protocols under heavy load.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors