Skip to content

Dawson-ma/Speaker-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Speaker Classification

Performing classification of speakers in speech signals. The model has achieved a remarkable accuracy rate of 97.06% on the task, ranking in the top 6% out of 1000 teams.

Data Preprocessing

The original dataset utilized is VoxCeleb. Data preprocessing involved transforming it into vectors using mel-frequency spectrum. Initially, the signal was converted into the frequency domain using Discrete Fourier Transform to obtain a spectrum. Subsequently, a filter bank, log transform, and Discrete Cosine Transform were applied to construct the vector. A window length of 128 was randomly chosen from the vectors. The processed dataset is stored here.

Model Training

The constructed model, based on Conformer, undergoes rigorous training using the preprocessed dataset to optimize its parameters and achieve the desired predictive performance.

Model Improvement Techniques

To enhance model performance, the following techniques were employed:

  1. Additive Margin Softmax
  2. Cosine learning rate scheduler

Dataset

The preprocessed VoxCeleb dataset serves as the primary data source for this project, accessible here.

For detailed implementation and usage instructions, please refer to the provided code.

About

Performing classification of speakers in speech signals. The model has achieved a remarkable accuracy rate of 97.06% on the task, ranking in the top 6% out of 1000 teams.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors