Skip to content

sadPororo/DCASE2022-Task3-SELD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A robust framework for sound event localization and detection on real recordings (DCASE2022 Challenge)

This repository provides the official training and testing code for the SE-ResNet34—BiGRU model, which won 3rd place in Task 3: Sound Event Localization and Detection(SELD), DCASE Challenge 2022.

Detailed information regarding our methodology, including the architecture shown below, can be found in our technical report: "A robust framework for sound event localization and detection on real recordings." (honored with Judge's Award)

Key Approaches

  • Bootstrapping Training Batch: While external datasets were permitted, we observed that simply increasing simulated data often led to performance degradation. To mitigate this, we proposed a batch balancing strategy that enables the model to learn from diverse external sound samples while effectively retaining real-world context.

  • First Introduction of TTA in SELD: To further enhance performance, we introduce a Test Time Augmentation (TTA) technique utilizing 16-pattern rotation for First-Order Ambisonics (FOA). This is the first instance of applying such a TTA strategy to the SELD task.

Getting Started

Environmental Supports

Ubuntu Python PyTorch

Datasets

Our model was trained and evaluated using a combination of real-world and synthetic datasets:

  • External Datasets: We utilized sound samples synthesized from five external sources, AudioSet[1], FSD50K[2], ESC-50[3], IRMAS[4], and Wearable SELD[5].
  • Previous DCASE Challenge Data: We incorporated synthetic SELD datasets from previous DCASE Challenges (2020 and 2021)[6, 7], which were generated using similar simulation techniques.
  • Real-world Data: We used the STARSS22 dataset[8], which contains real-world soundscapes provided for the DCASE 2022 Challenge.

All simulated soundscapes were generated using the official data generation repository provided by the challenge organizers.

[1]   J. F. Gemmeke, et al., “Audio Set: An ontology and human-labeled dataset for audio events,” in Proc. IEEE ICASSP, 2017.
[2]   E. Fonseca, et al., “FSD50K: An Open Dataset of Human-Labeled Sound Events,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 30, pp. 829-852, 2022.
[3]   K. J. Piczak, “ESC: Dataset for Environmental Sound Classification,” in Proc. ACM Conference on Multimedia, 2015.
[4]   J. J. Bosch, et al., “A Comparison of Sound Segregation Techniques for Predominant Instrument Recognition in Musical Audio Signals,” in Proc. ISMIR, 2012.
[5]   K. Nagatomo, et al., “Wearable Seld Dataset: Dataset For Sound Event Localization And Detection Using Wearable Devices Around Head,” in Proc. ICASSP, 2022.
[6]   A. Politis, et al., “A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection,” in Proc. DCASE2020 Workshop, 2020.
[7]   A. Politis, et al., “A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection,” in Proc. DCASE2021 Workshop, 2021.
[8]   A. Politis, et al., “STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events,” in Proc. DCASE2022 Workshop, 2022.

Description

This repository is built upon the official DCASE Challenge Baseline Repository.
The core components are organized as follows:

  • custom_model.py defines the model architecture, featuring a SE-ResNet34 backbone integrated with BiGRU and ADPIT-based SELD prediction heads.
  • parameters.py contains a set of hyperparameters for both the training and inference phases.
  • main_train_model.py is the primary script for model training.
  • main_test_model.py loads the model from a specified weight path (defined in parameters.py) and evaluates its SELD performance.

Citation

J. S. Kim, et al., "A robust framework for sound event localization and detection on real recordings," Tech. Rep., DCASE2022 Challenge, 2022.

@techreport{kim2022_dcase,
    title={A robust framework for sound event localization and detection on real recordings},
    author={Kim, Jin Sob and Park, Hyun Joon and Shin, Wooseok and Han, Sung Won},
    institution={DCASE2022 Challenge},
    year={2022},
    month={June}
}
@article{kim2025_arxiv,
  title={A Robust framework for sound event localization and detection on real recordings},
  author={Kim, Jin Sob and Park, Hyun Joon and Shin, Wooseok and Han, Sung Won},
  journal={arXiv preprint arXiv.2512.22156v1},
  year={2025}
}

License License: MIT

This repository is released under the MIT license.

Thanks to:

About

A robust framework for sound event localization and detection on real recordings, DCASE 2022 Challenge

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages