LensDB: Traffic Video Analytics Pipeline

This project implements a video analytics system designed for semantic retrieval and object counting. It utilizes CLIP embeddings for semantic understanding, keyframe selection algorithms for data compression, and lightweight MLPs for count prediction. It includes pipelines for training, indexing (FAISS), and benchmarking against YOLO baselines.

Prerequisites

Python 3.10+
CUDA-enabled GPU (recommended)
FFmpeg (for video decoding via av and opencv)

Installation

This project uses Poetry for dependency management.

Clone the repository:

git clone <repository_url>
cd traffic-video-pipeline

Install dependencies:

poetry install

Alternatively, if using pip:

pip install torch torchvision opencv-python transformers ultralytics faiss-cpu av pandas numpy pycocotools scikit-learn tabulate matplotlib

Set up the source path: Ensure the src directory is in your PYTHONPATH.
```
export PYTHONPATH=$PYTHONPATH:$(pwd)
```

Data Setup

1. VIRAT Video Dataset

This project uses the VIRAT Video Dataset for benchmarking and fine-tuning.

Download the VIRAT Video Dataset (Release 2.0) videos. You can find them at the official website or standard dataset repositories.
Create a directory for source videos:
```
mkdir -p videos_source
```
Place the downloaded .mp4 files into videos_source/.

2. COCO Dataset (Pre-training)

If you intend to pre-train the count predictor, download the COCO 2017 dataset. A helper script is provided:

python src/training/download_coco.py

This will download and extract data to data/coco/.

Directory Structure

The pipeline generates artifacts in a structured data/ directory.

traffic-video-pipeline/
├── data/
│   ├── coco/                  # COCO dataset (images/annotations)
│   └── VIRAT/                 # Processed VIRAT data
│       ├── VIRAT_S_000001/    # Per-video directory
│       │   ├── counts.csv     # Ground truth counts (generated by YOLO)
│       │   ├── frames/        # Extracted frames (optional)
│       │   ├── embeddings/    # CLIP embeddings (.npy) and metadata
│       │   └── keyframes/     # Selected keyframe images
│       └── ...
├── models/
│   └── checkpoints/           # Saved model weights (.pth)
├── src/                       # Source code
├── videos_source/             # Raw input MP4 files
└── ...

Usage

1. Ground Truth Generation

Before training or benchmarking, generate ground truth counts using a high-accuracy object detector (YOLOv8/11).

Modify src/main_train.py to point to your videos_source directory and run run_detection_on_dir:

# Inside src/main_train.py
run_detection_on_dir(
    videos_dir="videos_source", 
    model_name="yolov8l", 
    annotated=False
)

Run the script:

python src/main_train.py

This creates counts.csv files in data/VIRAT/<video_name>/.

2. Training the Count Predictor

The system uses an MLP to predict object counts from CLIP embeddings.

Pre-training (COCO):

# Inside src/main_train.py
pretrain_on_coco(
    coco_dir="data/coco",
    target="car",
    model_config=LARGE3
)

Fine-tuning (VIRAT):

# Inside src/main_train.py
finetune_on_virat(
    data_dir="data/VIRAT",
    target="car",
    pretrained_checkpoint="models/checkpoints/car_coco_pretrained.pth",
    model_config=LARGE3
)

3. Benchmarking & Keyframe Selection

To evaluate different pipeline configurations (Standard Embedding vs. Keyframe-based) and keyframe selection algorithms (FrameDiff, SSIM, MOG2, Flow):

Run the comprehensive test suite:

python src/comprehensive_test.py

Arguments in src/comprehensive_test.py allow you to configure:

keyframe_selectors: List of methods to test.
keyframe_params: Parameters for selectors (e.g., k_mad, min_spacing).
test_keyframes: Boolean to enable/disable keyframe logic.

4. Semantic Querying

To perform semantic retrieval (e.g., "Find frames with > 2 cars"):

# Inside src/main_train.py
results = evaluate_retrieval(
    data_dir="data/VIRAT",
    checkpoint_path="models/checkpoints/car_virat_finetuned.pth",
    target="car",
    count_threshold=2
)

Key Modules

src/keyframe/: Contains pre-selection logic.
- FrameDiffPreselector: Based on pixel difference.
- SSIMPreselector: Based on Structural Similarity Index.
- MOG2Preselector: Based on Background Subtraction.
- FlowPreselector: Based on Optical Flow magnitude.
src/embeddings/: CLIP embedding generation (supports OpenAI CLIP and MobileCLIP).
src/indexing/: FAISS index construction and flat-file management.
src/models/: MLP architecture (CountPredictor) definition.
src/training/: Training loops with loss handling for under-prediction penalties.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.gitignore		.gitignore
LensDB.pdf		LensDB.pdf
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LensDB: Traffic Video Analytics Pipeline

Prerequisites

Installation

Data Setup

1. VIRAT Video Dataset

2. COCO Dataset (Pre-training)

Directory Structure

Usage

1. Ground Truth Generation

2. Training the Count Predictor

3. Benchmarking & Keyframe Selection

4. Semantic Querying

Key Modules

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LensDB: Traffic Video Analytics Pipeline

Prerequisites

Installation

Data Setup

1. VIRAT Video Dataset

2. COCO Dataset (Pre-training)

Directory Structure

Usage

1. Ground Truth Generation

2. Training the Count Predictor

3. Benchmarking & Keyframe Selection

4. Semantic Querying

Key Modules

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages