This project aims to develop a music genre classification model based on audio analysis. A dataset containing recordings of different musical genres was used to extract relevant features to train a classifier.
The dataset used is GTZAN, which contains audio files from 10 musical genres (blues, classical, country, disco, hip-hop, jazz, metal, pop, reggae, and rock).
The GTZAN dataset has already been processed, but for the classifier to work with user-input audio, additional preprocessing is required to handle noise, background sounds, and echoes.
Steps taken:
- Signal normalization: Adjusting the signal amplitude to a standardized range to prevent extreme volume variations from affecting feature extraction.
- Noise removal:
- Low-pass filter: Reduces high-frequency noises such as hissing and whistling.
- Spectral Gating: Removes constant background noise.
RandomForest was used to evaluate the most relevant features for classification. The 20 most important features were selected, and a correlation matrix was applied. Features with a correlation above 75% were removed, resulting in a final set of 12 features.
A RandomForest classifier was trained with 80% of the data, while the remaining 20% was used for validation.
After training and testing, the model achieved 73% accuracy, demonstrating good performance even with noisy audio inputs.
To run this project, follow these steps:
- Open the
recm.ipynbfile in a Jupyter Notebook or Google Colab. - Generate an API key on Kaggle.
- Download the
kaggle.jsonfile and upload it to the notebook. - Run all the cells in the notebook.
To test the model with an audio file, simply upload your audio file in the notebook (the code specifies where to do this).
That's it! 🚀 Feel free to contribute.




