Speaker Classification

Performing classification of speakers in speech signals. The model has achieved a remarkable accuracy rate of 97.06% on the task, ranking in the top 6% out of 1000 teams.

Data Preprocessing

The original dataset utilized is VoxCeleb. Data preprocessing involved transforming it into vectors using mel-frequency spectrum. Initially, the signal was converted into the frequency domain using Discrete Fourier Transform to obtain a spectrum. Subsequently, a filter bank, log transform, and Discrete Cosine Transform were applied to construct the vector. A window length of 128 was randomly chosen from the vectors. The processed dataset is stored here.

Model Training

The constructed model, based on Conformer, undergoes rigorous training using the preprocessed dataset to optimize its parameters and achieve the desired predictive performance.

Model Improvement Techniques

To enhance model performance, the following techniques were employed:

Additive Margin Softmax
Cosine learning rate scheduler

Dataset

The preprocessed VoxCeleb dataset serves as the primary data source for this project, accessible here.

For detailed implementation and usage instructions, please refer to the provided code.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
Speaker_classification.ipynb		Speaker_classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker Classification

Data Preprocessing

Model Training

Model Improvement Techniques

Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speaker Classification

Data Preprocessing

Model Training

Model Improvement Techniques

Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages