MegaDetector-Acoustic

Microsoft AI for Good Lab's open-source AI for bioacoustic biodiversity monitoring — audio-based species detection and classification from passive acoustic recordings.

MegaDetector-Acoustic is part of the microsoft/Biodiversity ecosystem and is powered by the PyTorch-Wildlife framework. It is free, open-source, and available under the MIT license.

Part of the Biodiversity Ecosystem

MegaDetector-Acoustic is one model in a larger open-source ecosystem from the Microsoft AI for Good Lab. Each project lives in its own repository, with the microsoft/Biodiversity umbrella tying them together.

Repository	Description
microsoft/Biodiversity	The umbrella repository — documentation hub for the AI for Good Lab's biodiversity work
microsoft/MegaDetector	Animal, human, and vehicle detection for camera-trap images
microsoft/PytorchWildlife	The collaborative deep learning framework that hosts MegaDetector, species classifiers, and demo notebooks
microsoft/MegaDetector-Acoustic	This repo — bioacoustic AI for audio-based wildlife detection and classification
microsoft/MegaDetector-Overhead	Wildlife detection in aerial and drone imagery
microsoft/MegaDetector-Sonar	Sonar-based wildlife detection for aquatic monitoring
microsoft/MegaDetector-Classifier	Camera-trap species classification fine-tuning — adapt classifiers to your own datasets and geographic regions
microsoft/SPARROW	Solar-Powered Acoustic and Remote Recording Observation Watch — AI-enabled edge device for field recording
SPARROW Studio	Desktop application for all AI for Good Lab models

Overview

MegaDetector-Acoustic provides CLI scripts and training tools for audio-based wildlife detection and classification. The core deep learning infrastructure (models, datasets, spectrogram utilities) is provided by PyTorch-Wildlife.

Key capabilities:

Mel spectrogram generation from raw audio (GPU-accelerated)
Binary and multiclass species classification using ResNet backbones
Flexible YAML-based domain configuration
Sliding-window inference on long audio recordings
End-to-end pipeline from raw audio → model predictions → CSV output

Tested on:

PteroSet — tropical bird vocalizations from passive acoustic monitoring
CookInlet_Belugas — endangered Cook Inlet beluga whale detection

Installation

git clone https://github.com/microsoft/MegaDetector-Acoustic
cd MegaDetector-Acoustic
pip install -r requirements.txt

Requirements: Python 3.9+, PyTorch 2.0+

Quick Start

1. Configure your domain

Create a YAML config file for your domain (see template.yaml as reference):

name: "my_domain"
datasets:
  - "dataset_name_1"

class_names:
  0: "noise"
  1: "target_species"

paths:
  data_root: "${DATA_ROOT}"
  output_root: "${OUTPUT_ROOT}"

audio:
  sample_rate: 48000
  window_size_sec: 5.0
  overlap_sec: 4.0

spectrogram:
  n_fft: 2048
  hop_length: 512
  n_mels: 224

training:
  batch_size: 32
  lr: 0.0001
  epochs: 50
  backbone: "resnet18"

2. Prepare dataset

python prepare_dataset.py --config config/my_domain.yaml

3. Train

# Binary classification
python train.py --config config/my_domain.yaml \
    --train_csv train_split.csv \
    --val_csv val_split.csv \
    --test_csv test_split.csv

# Multiclass classification (e.g. 4 species)
python train.py --config config/my_domain.yaml \
    --train_csv train_split.csv \
    --test_csv test_split.csv \
    --num_classes 4

4. Run inference

python inference.py --config config/my_domain.yaml \
    --checkpoint model.ckpt \
    --audios_source /path/to/audio/folder \
    --dataset my_inference

Demo

The recommended way to get started is the end-to-end demo notebook at demo/bioacoustics_demo.ipynb. It uses real bird recordings from the PteroSet dataset and walks through:

Data exploration — annotation counts, species distribution
ONNX inference — download MD_AudioBirds_V1.onnx from Zenodo, run inference, visualise predictions vs. ground-truth
Training — binary classification (AVEVOC vs. noise) and multiclass (top-4 species) using ResNetClassifier

See demo/README.md for setup instructions and expected runtimes.

Repository Structure

MegaDetector-Acoustic/
├── train.py              # Training CLI script
├── inference.py          # Inference CLI script
├── prepare_dataset.py    # Dataset preparation pipeline
├── template.yaml         # Template configuration file
├── requirements.txt      # Python dependencies
└── demo/
    ├── bioacoustics_demo.ipynb   # End-to-end demo notebook
    ├── README.md
    ├── data/                     # Sample audio + annotations
    └── config/                   # Demo YAML configs

Citation

If you use MegaDetector-Acoustic in your research, please cite the PyTorch-Wildlife paper:

@misc{hernandez2024pytorchwildlife,
      title={Pytorch-Wildlife: A Collaborative Deep Learning Framework for Conservation},
      author={Andres Hernandez and Zhongqi Miao and Luisa Vargas and Sara Beery and Rahul Dodhia and Juan Lavista},
      year={2024},
      eprint={2405.12930},
      archivePrefix={arXiv},
}

You can also cite this software directly using the citation.cff file in this repository.

Contributing

Issues, feature requests, and pull requests are welcome at microsoft/MegaDetector-Acoustic/issues.

For framework-level changes (PyTorch-Wildlife API, models, datasets), see microsoft/PytorchWildlife. For ecosystem-wide questions, see the microsoft/Biodiversity umbrella.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
demo		demo
docs		docs
overrides		overrides
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
__init__.py		__init__.py
citation.cff		citation.cff
docs-requirements.txt		docs-requirements.txt
inference.py		inference.py
mkdocs.yml		mkdocs.yml
prepare_dataset.py		prepare_dataset.py
requirements.txt		requirements.txt
template.yaml		template.yaml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MegaDetector-Acoustic

Part of the Biodiversity Ecosystem

Overview

Installation

Quick Start

1. Configure your domain

2. Prepare dataset

3. Train

4. Run inference

Demo

Repository Structure

Citation

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MegaDetector-Acoustic

Part of the Biodiversity Ecosystem

Overview

Installation

Quick Start

1. Configure your domain

2. Prepare dataset

3. Train

4. Run inference

Demo

Repository Structure

Citation

Contributing

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages