Skip to content

RajShah-1/lensdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LensDB: Traffic Video Analytics Pipeline

This project implements a video analytics system designed for semantic retrieval and object counting. It utilizes CLIP embeddings for semantic understanding, keyframe selection algorithms for data compression, and lightweight MLPs for count prediction. It includes pipelines for training, indexing (FAISS), and benchmarking against YOLO baselines.

Prerequisites

  • Python 3.10+
  • CUDA-enabled GPU (recommended)
  • FFmpeg (for video decoding via av and opencv)

Installation

This project uses Poetry for dependency management.

  1. Clone the repository:

    git clone <repository_url>
    cd traffic-video-pipeline
  2. Install dependencies:

    poetry install

    Alternatively, if using pip:

    pip install torch torchvision opencv-python transformers ultralytics faiss-cpu av pandas numpy pycocotools scikit-learn tabulate matplotlib
  3. Set up the source path: Ensure the src directory is in your PYTHONPATH.

    export PYTHONPATH=$PYTHONPATH:$(pwd)

Data Setup

1. VIRAT Video Dataset

This project uses the VIRAT Video Dataset for benchmarking and fine-tuning.

  1. Download the VIRAT Video Dataset (Release 2.0) videos. You can find them at the official website or standard dataset repositories.
  2. Create a directory for source videos:
    mkdir -p videos_source
  3. Place the downloaded .mp4 files into videos_source/.

2. COCO Dataset (Pre-training)

If you intend to pre-train the count predictor, download the COCO 2017 dataset. A helper script is provided:

python src/training/download_coco.py

This will download and extract data to data/coco/.

Directory Structure

The pipeline generates artifacts in a structured data/ directory.

traffic-video-pipeline/
├── data/
│   ├── coco/                  # COCO dataset (images/annotations)
│   └── VIRAT/                 # Processed VIRAT data
│       ├── VIRAT_S_000001/    # Per-video directory
│       │   ├── counts.csv     # Ground truth counts (generated by YOLO)
│       │   ├── frames/        # Extracted frames (optional)
│       │   ├── embeddings/    # CLIP embeddings (.npy) and metadata
│       │   └── keyframes/     # Selected keyframe images
│       └── ...
├── models/
│   └── checkpoints/           # Saved model weights (.pth)
├── src/                       # Source code
├── videos_source/             # Raw input MP4 files
└── ...

Usage

1. Ground Truth Generation

Before training or benchmarking, generate ground truth counts using a high-accuracy object detector (YOLOv8/11).

Modify src/main_train.py to point to your videos_source directory and run run_detection_on_dir:

# Inside src/main_train.py
run_detection_on_dir(
    videos_dir="videos_source", 
    model_name="yolov8l", 
    annotated=False
)

Run the script:

python src/main_train.py

This creates counts.csv files in data/VIRAT/<video_name>/.

2. Training the Count Predictor

The system uses an MLP to predict object counts from CLIP embeddings.

Pre-training (COCO):

# Inside src/main_train.py
pretrain_on_coco(
    coco_dir="data/coco",
    target="car",
    model_config=LARGE3
)

Fine-tuning (VIRAT):

# Inside src/main_train.py
finetune_on_virat(
    data_dir="data/VIRAT",
    target="car",
    pretrained_checkpoint="models/checkpoints/car_coco_pretrained.pth",
    model_config=LARGE3
)

3. Benchmarking & Keyframe Selection

To evaluate different pipeline configurations (Standard Embedding vs. Keyframe-based) and keyframe selection algorithms (FrameDiff, SSIM, MOG2, Flow):

Run the comprehensive test suite:

python src/comprehensive_test.py

Arguments in src/comprehensive_test.py allow you to configure:

  • keyframe_selectors: List of methods to test.
  • keyframe_params: Parameters for selectors (e.g., k_mad, min_spacing).
  • test_keyframes: Boolean to enable/disable keyframe logic.

4. Semantic Querying

To perform semantic retrieval (e.g., "Find frames with > 2 cars"):

# Inside src/main_train.py
results = evaluate_retrieval(
    data_dir="data/VIRAT",
    checkpoint_path="models/checkpoints/car_virat_finetuned.pth",
    target="car",
    count_threshold=2
)

Key Modules

  • src/keyframe/: Contains pre-selection logic.
    • FrameDiffPreselector: Based on pixel difference.
    • SSIMPreselector: Based on Structural Similarity Index.
    • MOG2Preselector: Based on Background Subtraction.
    • FlowPreselector: Based on Optical Flow magnitude.
  • src/embeddings/: CLIP embedding generation (supports OpenAI CLIP and MobileCLIP).
  • src/indexing/: FAISS index construction and flat-file management.
  • src/models/: MLP architecture (CountPredictor) definition.
  • src/training/: Training loops with loss handling for under-prediction penalties.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages