Music Mood Classification Using the Million Song Dataset


Bhavika Tekwani


Given audio features for a song, can we predict what mood the song represents?


Do audio features help with mood identification?




  • Indexing
  • Metadata generation
  • Predicting success ("Hit Song Science")
  • Recommender Systems


Million Song Dataset Spotify API 
Artist, Song title Speechiness
Duration Energy
Loudness Acousticness
Key, Mode, Time Signature Instrumentalness
Tempo Danceability
Segments Pitches (Chroma features, 2D)
Segments Timbre (MFCC + PCA, 2D)

Hand labelled 7396 songs as 'happy' and 'sad'. Train test split is 60/40.

Imputing missing values

  • All songs in the Million Song Subset (10,000 songs) had 0 for Energy and Danceability i.e., they had not been analysed.

  • Used Spotify's Web API to fetch Danceability, Energy, Acousticness, Instrumentalness and Speechiness metrics.

  • If a song from the dataset was not on Spotify, I imputed the mean of the feature as the missing value.


Understanding the data

Low Level Segment Features



Descriptive Features

Speechiness, Danceability, Tempo, Loudness, Energy, Acousticness, Instrumentalness


Notational Features

Key, Mode,

Time Signature

Feature Engineering

  • Square loudness (dB) for interpretability


  • Scale energy, tempo, loudness to Gaussian distribution (mean = 0, variance = 1)


  • Segment aggregation: Convert segment level 2D information to track level 1D feature


  • Key * Mode, Tempo * Mode to capture multiplicative interaction


Segment Aggregation



  • A segment is 0.3 seconds long. Each segment has a pitch and timbre.

  • Pitch: 2D array of Chroma features. The shape varies from (100, 12) to (1600, 12).


  • Timbre: 2D array of MFCC features. Shape varies from (100, 12) to (1600, 12).

  • Mel Frequency Cepstral Coefficients (MFCC) captures the logarithmic perception of loudness and pitch as heard by a human.

  • Aggregation: Calculate the min, max, kurtosis, mean, standard deviation, variance of each segment and average over them



Feature Selection

  • Recursive Feature Elimination with Random Forest Classifier and 5 fold cross validation

  • Used 25 of a possible 52 features

  • Compared feature importance by model

  • Most important descriptive feature was Danceability, followed by Energy, Speechiness and Beats

Modelwise Feature Importance

Model Comparison

Model Features CV score Test accuracy
Random Forest Segment + Desc 73.33 75.44
Segment 71.73 73.13
XGBoost Segment + Desc 73.33 75.24
Segment 71.73 73.10
Gradient Boosting Segment + Desc 72.65 74.39
Segment 71.12 72.87
Extra Trees Segment + Desc 68.44 73.86
Segment 68.14 71.81
SVM Segment + Desc 73.33 73.26
Segment 71.81 69.97

Further Work

  • KNearest Neighbour with Mahalanobis distance


  • Exploring whether lyrics can be added as features 


Visualization: Seaborn, Matplotlib

Models: Scikit-Learn, XGBoost, Pandas, Numpy

Spotify API wrapper: Spotipy

Data Wrangling: SQL


Thank You.

Music Mood Classification

By Bhavika Tekwani

Music Mood Classification

Mood Classification on the Million Song Dataset

  • 737
Loading comments...

More from Bhavika Tekwani