Long-Tailed Object Detection for Drone-Based Intelligent Counting

This repository contains a complete, high-performance solution for the CVPDL Homework #2: Long-Tailed Object Detection challenge. The project focuses on detecting objects (car, hov, person, motorcycle) in drone-based imagery, with a specific emphasis on addressing the severe class imbalance inherent in the dataset.

The entire system is built from the ground up, adhering to the strict "no pre-trained weights" constraint. It employs a sophisticated data-centric pipeline, a custom-modified YOLOv8 architecture, and an advanced post-processing strategy to maximize performance on the private leaderboard.

Key Features

Advanced Data-Centric Strategy: Implements a multi-faceted Repeat Factor Sampler (RFS) to combat class imbalance. The sampler intelligently oversamples images based not only on rare class frequency but also on the presence of small objects, with fine-grained controls via class-specific and size-aware boosts.
Custom YOLOv8 Architecture for Small Objects: Utilizes a modified YOLOv8-P2 model architecture, which incorporates an additional detection head at the P2 feature level. This significantly improves the model's ability to detect small objects like person and motorcycle, which are critical in aerial imagery.
Attention-Enhanced Models: Includes model configurations with the Convolutional Block Attention Module (CBAM), allowing the network to focus on more informative spatial and channel-wise features, further enhancing detection accuracy.
Strategic Post-Processing Pipeline: Employs per-class confidence thresholds and minimum-size filters during inference. This allows for a surgical approach to balancing precision and recall—using lower thresholds for rare classes and higher thresholds for common ones to suppress false positives.
Robust & OOM-Safe Inference Engine: The prediction script is engineered for efficiency and stability. It features automatic batch size backoff to gracefully handle potential CUDA Out-Of-Memory errors, ensuring that inference can complete even on resource-constrained systems.
Weighted Boxes Fusion (WBF) for Ensembling: Includes a ready-to-use script for ensembling predictions from multiple models. WBF is superior to traditional Non-Maximum Suppression (NMS) for combining bounding boxes, leading to a significant boost in final submission scores.
Comprehensive Analysis & Visualization Tools: Ships with eda.py for deep dataset analysis and visualize.py for qualifying model predictions, turning raw numbers into actionable insights.
Highly Modular and Configurable Codebase: All critical parameters—from data preparation and augmentation to model training and inference—are centralized in a single src/config.py file, making experimentation and tuning fast and systematic.

Methodology

The core of this project is a data-driven approach. Recognizing that "training from scratch" makes the model highly sensitive to data quality and distribution, the primary focus was on manipulating the data stream to present the model with a more balanced and challenging curriculum.

Data Preparation (data_preparation.py):
- Stratified Validation Split: Instead of a simple random split, a long-tail-aware splitter is used. It ensures the validation set has a representative distribution of rare classes and includes the most object-dense images, providing a more reliable performance metric.
- Repeat Factor Sampling (RFS): A sophisticated playlist of training images is generated. Each image's repetition count is calculated based on:
  - Class Rarity: Images containing rarer classes are repeated more often.
  - Class-Specific Boosts: Manual multipliers to further increase sampling pressure on challenging classes like hov and person.
  - Small-Object-Aware Boost: An additional boost is applied to images containing a high ratio of small objects, ensuring the model gets sufficient training on these difficult cases.
Model Architecture (models/yolov8s_p2.yaml & model.py):
- The standard YOLO architecture was modified to include a P2 detection head. The FPN/PAN structure was extended to fuse features from the C2 layer (stride 4), providing a higher-resolution feature map ideal for small object detection.
- Custom modules like CBAM can be seamlessly integrated into the YOLO architecture YAML, demonstrating a deep understanding of the Ultralytics framework.
Training (train.py):
- The model is trained from random initialization.
- Aggressive data augmentation is used, including Mosaic, to create a wide variety of training scenes and prevent overfitting.
- Custom callbacks (callbacks.py) are used to monitor performance, specifically logging the per-class AP@50 at each epoch to track progress on tail classes.
Inference & Post-Processing (predict.py):
- A low global confidence threshold is used to generate a large pool of candidate detections.
- A second, more stringent filtering stage is applied using per-class confidence and size thresholds defined in config.py. This is a critical step for maximizing the final mAP score.

Project Structure

Long-Tailed-Object-Detection/
├── data/
│   ├── train/                  # Place raw training images and labels here
│   └── test/                   # Place raw testing images here
├── outputs/                    # Training runs, weights, and analysis results
├── logs/                       # Log files for training runs
├── src/
│   ├── models/
│   │   ├── yolov8s_p2.yaml     # Custom model architecture with P2 head
│   │   └── ...                 # Other custom model YAMLs
│   ├── __init__.py
│   ├── config.py               # Central configuration file for the entire project
│   ├── data_preparation.py     # Script to process data and generate RFS playlists
│   ├── model.py                # Defines custom nn.Modules (e.g., CBAM)
│   ├── callbacks.py            # Custom callbacks for logging during training
│   ├── train.py                # Main script to launch model training
│   ├── predict.py              # Script for running inference
│   └── ensemble.py             # Script to fuse predictions using WBF
├── tools/
│   ├── eda.py                  # Exploratory Data Analysis script
│   └── visualize.py            # Script to visualize model predictions
├── requirements.txt            # Project dependencies
└── README.md                   # This file

Setup and Installation

1. Clone the Repository

cd <directory-name>

2. Create a Python Environment

It is highly recommended to use a virtual environment (e.g., venv or conda) to manage dependencies. This project requires Python >= 3.10.

# Using conda
conda create -n long_tail python=3.10
conda activate long_tail

3. Install PyTorch

PyTorch installation depends on your system's CUDA version. It is intentionally excluded from requirements.txt to ensure a correct installation. Please visit the official PyTorch website to find the appropriate command for your setup.

Example for CUDA 12.4:

conda install mkl==2023.1.0 pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia

4. Install Other Dependencies

Once PyTorch is installed, install the remaining packages using the provided requirements.txt file.

pip install -r requirements.txt

5. Download and Place the Dataset

Download the dataset and place the contents into the data/ directory. The structure should be data/train/ and data/test/. The data_preparation.py script will automatically handle the creation of images and labels subdirectories if they don't exist.

Usage / Workflow

Follow these steps in order to replicate the results.

Step 1: Prepare the Data

This script is the cornerstone of the project. It organizes the raw data into the required images/ and labels/ structure, performs the stratified train/validation split, converts labels to YOLO format, and generates the crucial training playlists based on the RFS strategy defined in config.py. This step must be run first.

python -m src.data_preparation

This will create a data/yolo_dataset directory containing all the necessary files for training and organize the raw data directory structure.

data/
├── train/                  # Original raw training data remains untouched
│   ├── images/
│   └── labels/
├── test/                   # Original raw test data remains untouched
│   └── images/
│
└── yolo_dataset/           # <-- Directory generated by data_preparation.py
    ├── dataset.yaml        # Main config file for the YOLO trainer
    ├── train.txt           # "Playlist" of training images (with RFS repeats)
    ├── val.txt             # "Playlist" of validation images
    ├── images/
    │   ├── train/
    │   │   ├── img0001.png
    │   │   ├── img0002.png
    │   │   ├── img0003.png
    │   │   └── ...
    │   └── val/
    │       ├── img0951.png
    │       ├── img0952.png
    │       ├── img0953.png
    │       └── ...
    └── labels/
        ├── train/
        │   ├── img0001.txt # Converted to YOLO format (class_id xc yc w h)
        │   ├── img0002.txt
        │   ├── img0003.txt
        │   └── ...
        └── val/
            ├── img0951.txt # Converted to YOLO format
            ├── img0952.txt
            ├── img0953.txt
            └── ...

Step 2: (Optional but Recommended) Exploratory Data Analysis

After preparing the data, you can run the EDA script to gain insights into the dataset's characteristics. This helps in understanding the motivation behind the strategies used in this project.

python -m tools.eda

This will generate a series of analytical plots (class distribution, object sizes, etc.) and save them in the outputs/eda_analysis/ directory.

Step 3: Train the Model

Launch the training process using the train.py script. You can specify which model architecture to use and give your experiment a unique name. All outputs (weights, logs, etc.) will be saved under the outputs/ directory.

python -m src.train --cfg ./src/models/yolov8s_p2.yaml --name yolov8s_p2_run --batch 2

--cfg: Path to the model definition YAML.
--name: A unique name for this training run (required).
--batch: Adjust the batch size based on your GPU's VRAM (the project is optimized for <= 12GB).

Step 4: Run Inference

Once training is complete, use the predict.py script to generate a submission.csv file for the test set. Be sure to point to the best weights (best.pt) saved from your training run.

python -m src.predict --weights outputs/yolov8s_p2_run/weights/best.pt --output yolov8s_p2.csv

--weights: Path to the trained model weights.
--output: Filename for the final submission CSV.

You can also use the --classes argument to filter predictions for specific classes. This is highly useful for debugging or analyzing the performance on tail classes. Class IDs are: 0: car, 1: hov, 2: person, 3: motorcycle.

For example, to generate a submission containing only person and motorcycle predictions:

python src/predict.py --weights outputs/yolov8s_p2_run/weights/best.pt --output submission_person_mc_only.csv --classes 2 3

Step 5: Visualize Predictions

To qualitatively assess your model's performance, use the visualization script. It draws the predicted bounding boxes on the test images, helping you understand your model's strengths and weaknesses.

python -m tools.visualize --csv submission_yolov8s_p2.csv --output-dir outputs/visualizations_run1

--csv: The submission file generated in the previous step.
--output-dir: A directory to save the visualized images, grid, and stats panel.

Step 6: (Optional) Ensemble Predictions

To achieve the best possible score, you can train multiple models (e.g., with different architectures or configurations) and ensemble their predictions.

Run predict.py for each of your trained models, saving their outputs as different CSV files.
Place all the generated CSV files into a single directory (e.g., src/predictions/).
Run the ensemble.py script. It will automatically discover the CSVs, fuse them using Weighted Boxes Fusion, and create a final ensembled submission file.

# First, create the directory if it doesn't exist
mkdir predictions

# Move your individual prediction files into it
mv submission_model1.csv predictions/
mv submission_model2.csv predictions/
# ...

# Now, run the ensembling script
python -m src.ensemble

The final submission will be saved as wbf_submission.csv by default.

Configuration

The entire project is controlled by src/config.py. This file is heavily commented and organized for easy experimentation. Before running, you may want to review and adjust:

Training Parameters: TOTAL_EPOCHS, BATCH_SIZE, IMG_SIZE.
Long-Tail Strategy: Enable/disable USE_WEIGHTED_SAMPLER, and tune RFS_CLASS_SPECIFIC_BOOST and SMALLOBJ_CLASS_BOOST values.
Augmentation: MOSAIC_PROB, COPY_PASTE_PROB, etc.
Inference: PER_CLASS_CONFIDENCE_THRESHOLDS and MIN_WH_PER_CLASS are critical for maximizing your score.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
src		src
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Long-Tailed Object Detection for Drone-Based Intelligent Counting

Key Features

Methodology

Project Structure

Setup and Installation

1. Clone the Repository

2. Create a Python Environment

3. Install PyTorch

4. Install Other Dependencies

5. Download and Place the Dataset

Usage / Workflow

Step 1: Prepare the Data

Step 2: (Optional but Recommended) Exploratory Data Analysis

Step 3: Train the Model

Step 4: Run Inference

Step 5: Visualize Predictions

Step 6: (Optional) Ensemble Predictions

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Long-Tailed Object Detection for Drone-Based Intelligent Counting

Key Features

Methodology

Project Structure

Setup and Installation

1. Clone the Repository

2. Create a Python Environment

3. Install PyTorch

4. Install Other Dependencies

5. Download and Place the Dataset

Usage / Workflow

Step 1: Prepare the Data

Step 2: (Optional but Recommended) Exploratory Data Analysis

Step 3: Train the Model

Step 4: Run Inference

Step 5: Visualize Predictions

Step 6: (Optional) Ensemble Predictions

Configuration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages