Skip to content

RP335/qvim-challenge-aalto

 
 

Repository files navigation

AES AIMLA Challenge 2025 System by Team AudioAlchemy

Query-by-Vocal Imitation Challenge

Query by Vocal Imitation (QVIM) enables users to search a database of sounds via a vocal imitation of the desired sound. This offers sound designers an intuitively expressive way of navigating large sound effects databases.

This is our code for this challenge.

Important Dates

  • Challenge start: April 1, 2025
  • Challenge end: June 15, 2025
  • Challenge results announcement: July 15, 2025

For more details, please have a look at our website.

Baseline System

This repository contains the modified baseline system for the AES AIMLA Challenge 2025. The architecture and the training procedure is based on "Improving Query-by-Vocal Imitation with Contrastive Learning and Audio Pretraining" (DCASE2025 Workshop).

Getting Started

Prerequisites

  1. Clone this repository.
git clone https://github.com/qvim-aes/qvim-baseline.git
  1. Create and activate a conda environment with Python 3.10:
conda create -f environment.yml
conda activate qvim-ensemble
  1. Install 7z, e.g.,
# (on linux)
sudo apt install p7zip-full
# (on windows)
conda install -c conda-forge 7zip

For linux users: do not use conda package p7zip - this package is based on the outdated version 16.02 of 7zip; to extract the dataset, you need a more recent version of p7zip.

  1. If you have not used Weights and Biases for logging before, you can create a free account. On your machine, run wandb login and copy your API key from this link to the command line.

Training

All training is handled by the unified script src/qvim_mn_baseline/train.py. It is highly configurable and supports multiple model architectures and data augmentation strategies.

To see all available options, run:

export PYTHONPATH=$(pwd)/src
python src/qvim_mn_baseline/train.py --help

Usage Examples

1. Train the MobileNetV3 Baseline (Original)

This command replicates the original baseline without advanced augmentations.

python src/qvim_mn_baseline/train.py \
    --model_type mobilenet \
    --project "qvim-experiments" \
    --model_save_path "checkpoints"

2. Train a PaSST Model with "Light" Augmentations

This example uses the powerful PaSST model and enables the "light" augmentation profile.

python src/qvim_mn_baseline/train.py \
    --model_type passt \
    --use_augmentations true \
    --aug_profile light \
    --batch_size 12 \
    --n_epochs 50 \
    --project "qvim-experiments" \
    --model_save_path "checkpoints"

3. Train a BEATs Model with "Full" Augmentations and SpecMix

This command trains the BEATs model using the "full" augmentation profile, which includes SpecMix.

python src/qvim_mn_baseline/train.py \
    --model_type beats \
    --beats_checkpoint_path "path/to/your/BEATs_iter3.pt" \
    --use_augmentations true \
    --aug_profile full \
    --batch_size 8 \
    --n_epochs 75 \
    --project "qvim-experiments" \
    --model_save_path "checkpoints"

Evaluation Results

Model Name MRR (exact match)
random 0.0444
MN baseline 0.2726
MN + Light Aug 0.2835
PaSST 0.1502
PANNs 0.1577
BEATs 0.2309

Contact

For questions or inquiries, please contact rahul.peter@aalto.fi or vivek.mohan@aalto.fi.

Citation

@inproceedings{Greif2024,
    author = "Greif, Jonathan and Schmid, Florian and Primus, Paul and Widmer, Gerhard",
    title = "Improving Query-By-Vocal Imitation with Contrastive Learning and Audio Pretraining",
    booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024)",
    address = "Tokyo, Japan",
    month = "October",
    year = "2024",
    pages = "51--55"
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 71.2%
  • Python 28.5%
  • Shell 0.3%