Intuition: To build a high-performance time-series database engine built from scratch in C++, along with provision of a real-time streaming analytics pipeline. This project is an attempt to understand both low-level systems programming and high-throughput data engineering.
The system is designed to:
- Store numerical time-series data with maximum efficiency using custom C++ compression.
- Query data with sub-millisecond latency for interactive analysis.
- Monitor the health of the data stream in real-time to detect anomalies and ensure data integrity.
- Custom C++ Storage Engine: Core logic is written in C++17 for direct memory control, achieving a 2.0x compression ratio with custom Delta-of-Delta and XOR algorithms.
- High-Performance Querying: A time-sharded architecture enables sub-16ms cold query latencies, avoiding slow full-disk scans.
- Real-Time Anomaly Detection: A streaming pipeline uses Apache Spark to continuously analyze metrics from an Apache Kafka topic, detecting data integrity failures (e.g., compression ratio drops) in real-time.
- High-Throughput Ingestion: The Kafka pipeline is tuned for performance, capable of sustaining ingestion rates of over 250,000 messages/sec on commodity hardware.
- Modern API & Tooling: The system is fully containerized with Docker Compose, built with CMake, and includes a Python/FastAPI service for RESTful access.
The project is a hybrid architecture combining a high-performance C++ database with a scalable data engineering pipeline for real-time monitoring.
graph TD
subgraph "Streaming & Real-Time Analytics Pipeline"
Producer[C++ Engine / Data Source] -- "1. Raw Metrics Stream" --> Kafka[Apache Kafka Topic<br/>'raw-metrics'];
Kafka -- "2. Read Stream" --> Spark[Spark Structured Streaming];
Spark -- "3. Detect Anomalies" --> KafkaAlerts[Apache Kafka Topic<br/>'alerts'];
KafkaAlerts -- "4. Consume Alerts" --> Consumer[Monitoring Dashboard / Alerting System];
end
subgraph "Storage & Query Engine (for Historical Analysis)"
Producer -- "Direct Ingest via API" --> API[FastAPI Service];
API -- "C++ Bridge" --> CppEngine[High-Performance C++ Engine];
CppEngine <--> Shards[(Disk Storage:<br/>Time-Sharded .bin Files)];
Client[GUI / API Client] -- "Query Time Range" --> API;
end
The tests were done extensively on commodity hardware : intel i7-12700h (14C/20T), DDR4 RAM (3200Mhz), PCIe NVMe Gen 4 SSD.
| Metric | Result | Analysis |
|---|---|---|
| Kafka Ingestion Throughput | ~267,000 ops/sec | Validates the scalability of the Kafka pipeline, achieved by using a parallel Python producer to bypass the GIL and saturate multiple cores. |
| Storage Efficiency | ~8.2 bytes/point | A 50% storage reduction via custom C++ compression on high-entropy data. |
| Cold Query Latency (p99) | ~12 ms | Querying a 24-hour window proves the time-sharded C++ design avoids slow full-disk scans for historical data analysis. |
| C++ Ingestion Throughput | ~4,000 points/sec | Baseline for single-point file I/O |
- Docker & Docker Compose
- Git
- Python 3.10+ &
venv
Clone the repository and use Docker Compose to build and start all services in the background.
git clone https://github.com/KaranSinghDev/Time-Series-Database-Engine.git
cd Time-Series-Database-Engine
docker-compose up -d- To view the Spark UI: Open your browser to
http://localhost:8080.
This demonstrates the end-to-end streaming pipeline. Open three separate terminals.
# Terminal 1: Start the Alert Consumer
python alert_consumer.py
# Terminal 2: Start the Data Producer
python producer.py
# Terminal 3: Submit the Spark Streaming Job
docker exec -it tsdb-spark-master /opt/spark/bin/spark-submit \
--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1 \
/app/stream_processor.pyYou will now see alerts appear in Terminal 1 whenever the producer in Terminal 2 simulates a compression failure.
This single script allows you to test both the C++ engine and the Kafka pipeline.
# In a new terminal, with the venv activated
python benchmark.pyYou will be presented with an interactive menu to choose which benchmark to run.
To work on the C++ engine directly, you'll need a C++17 compiler and CMake.
cd engine
cmake -B build
cmake --build build# From the 'engine' directory
./build/engine_test