A deterministic streaming-style data pipeline that validates financial transaction events, evaluates compliance rules in near-real time, and produces reproducible audit artifacts including flags, checkpoints, and run logs.
Financial systems generate continuous streams of transaction events.
Compliance monitoring requires detecting suspicious patterns such as:
- unusually large transactions
- rapid bursts of deposits
- repeated suspicious behaviour within short windows
These checks must run on a stream of incoming events, while ensuring:
- strict schema validation
- deterministic rule evaluation
- no event skipping
- restart-resume safety
- auditable run artifacts
This project implements a local streaming compliance pipeline that consumes an append-only event stream and produces rule-based compliance flags with full run evidence.
Pipeline flow:
- Event generator produces transaction events.
- Events are appended to a persisted stream file.
- The watcher process consumes new events using byte offsets.
- Each event is validated against a strict contract.
- Valid events are evaluated against compliance rules.
- Suspicious patterns emit flag records.
- Checkpoints persist stream progress.
- Run logs and validation reports capture execution evidence.
- Evidence notebook inspects artifacts for verification.
This architecture models a simplified stream processing system with deterministic rule evaluation and restart-safe consumption.
- Python
- JSONL event streaming
- Contract-based validation
- Stateful rule evaluation
- Pytest testing framework
- Jupyter notebook for run evidence
local-stream-data-product/
data/
sample_stream.log sample persisted stream input
src/
contracts/
event_contract.py schema validation logic
pipelines/
rules.py compliance rule evaluation
watcher.py stream consumer and rule engine
stream/
generator.py event stream producer
tests/
test_contracts.py
test_rules.py
test_generator.py
test_watcher.py
test_smoke_pipeline.py
outputs/
checkpoints/ stream progress checkpoint
flags/ emitted compliance flags
reports/ validation reports
logs/
run logs capturing execution metadata
docs/
architecture.png system architecture diagram
evidence/
EvidenceNB.ipynb run verification notebook
framework/
stream_framework.md
All incoming events are validated against a strict schema before rule evaluation.
Rules operate on event state with deterministic behavior for reproducibility.
Per-user event history enables velocity and repetition detection.
Invalid JSON, schema violations, duplicates, or ordering issues terminate the run immediately.
The watcher persists byte offsets so processing resumes exactly where the previous run stopped.
Each run produces:
- run logs
- validation reports
- emitted flags
- checkpoint state
These artifacts allow independent verification of pipeline execution.
The project includes:
- contract validation tests
- rule evaluation tests
- generator behavior tests
- watcher smoke tests
Clone the repository.
git clone https://github.com/D-Atul/local-stream-data-product.git
cd local-stream-data-product
Create a virtual environment.
python -m venv .venv
source .venv/bin/activate
Install dependencies.
pip install -r requirements.txt
Generate a sample stream.
python -m src.stream.generator
Start the watcher.
python -m src.pipelines.watcher
Artifacts produced:
logs/ run metadata
outputs/checkpoints/ stream progress state
outputs/flags/ compliance flags
outputs/reports/ validation reports
Run all tests with:
pytest -q
The test suite covers contract validation, rule evaluation, generator behavior, and pipeline startup through a smoke test.
Example artifacts generated by the pipeline:
Run log:
logs/run_success.json
Checkpoint state:
outputs/checkpoints/state_success.json
Compliance flags:
outputs/flags/flag_success.log
Validation report:
outputs/reports/validation_success.json
These artifacts demonstrate deterministic processing and reproducible pipeline behavior.
The notebook in evidence/ verifies execution artifacts and demonstrates:
- run metadata inspection
- stream offset verification
- restart-resume correctness
- fail-fast validation behavior
This provides a transparent verification layer for pipeline execution.
This project demonstrates core data engineering practices required for real streaming systems:
- event stream consumption
- contract-based validation
- stateful rule evaluation
- restart-safe processing
- deterministic outputs
- reproducible execution artifacts
- automated testing discipline
Together, these elements represent the design of a production-minded streaming compliance data product.
