Music Mood Classification Using the Million Song Dataset
Given audio features for a song, can we predict what mood the song represents?
Do audio features help with mood identification?
- Metadata generation
- Predicting success ("Hit Song Science")
- Recommender Systems
|Million Song Dataset||Spotify API|
|Artist, Song title||Speechiness|
|Key, Mode, Time Signature||Instrumentalness|
|Segments Pitches (Chroma features, 2D)|
|Segments Timbre (MFCC + PCA, 2D)
Hand labelled 7396 songs as 'happy' and 'sad'. Train test split is 60/40.
Imputing missing values
- All songs in the Million Song Subset (10,000 songs) had 0 for Energy and Danceability i.e., they had not been analysed.
- Used Spotify's Web API to fetch Danceability, Energy, Acousticness, Instrumentalness and Speechiness metrics.
- If a song from the dataset was not on Spotify, I imputed the mean of the feature as the missing value.
Understanding the data
Low Level Segment Features
Speechiness, Danceability, Tempo, Loudness, Energy, Acousticness, Instrumentalness
- Square loudness (dB) for interpretability
- Scale energy, tempo, loudness to Gaussian distribution (mean = 0, variance = 1)
- Segment aggregation: Convert segment level 2D information to track level 1D feature
- Key * Mode, Tempo * Mode to capture multiplicative interaction
- A segment is 0.3 seconds long. Each segment has a pitch and timbre.
- Pitch: 2D array of Chroma features. The shape varies from (100, 12) to (1600, 12).
- Timbre: 2D array of MFCC features. Shape varies from (100, 12) to (1600, 12).
- Mel Frequency Cepstral Coefficients (MFCC) captures the logarithmic perception of loudness and pitch as heard by a human.
- Aggregation: Calculate the min, max, kurtosis, mean, standard deviation, variance of each segment and average over them
Recursive Feature Elimination with Random Forest Classifier and 5 fold cross validation
Used 25 of a possible 52 features
Compared feature importance by model
Most important descriptive feature was Danceability, followed by Energy, Speechiness and Beats
Modelwise Feature Importance
|Model||Features||CV score||Test accuracy|
|Random Forest||Segment + Desc||73.33||75.44|
|XGBoost||Segment + Desc||73.33||75.24|
|Gradient Boosting||Segment + Desc||72.65||74.39|
|Extra Trees||Segment + Desc||68.44||73.86|
|SVM||Segment + Desc||73.33||73.26|
- KNearest Neighbour with Mahalanobis distance
- Exploring whether lyrics can be added as features
Visualization: Seaborn, Matplotlib
Models: Scikit-Learn, XGBoost, Pandas, Numpy
Spotify API wrapper: Spotipy
Data Wrangling: SQL
Music Mood Classification
By Bhavika Tekwani