Music Mood Classification Using the Million Song Dataset

 

Bhavika Tekwani

Problem

Given audio features for a song, can we predict what mood the song represents?

 

Do audio features help with mood identification?

 

 

Motivation

  • Indexing
  • Metadata generation
  • Predicting success ("Hit Song Science")
  • Recommender Systems

Data

Million Song Dataset Spotify API 
Artist, Song title Speechiness
Duration Energy
Loudness Acousticness
Key, Mode, Time Signature Instrumentalness
Tempo Danceability
Segments Pitches (Chroma features, 2D)
Segments Timbre (MFCC + PCA, 2D)
Beats

Hand labelled 7396 songs as 'happy' and 'sad'. Train test split is 60/40.

Imputing missing values

  • All songs in the Million Song Subset (10,000 songs) had 0 for Energy and Danceability i.e., they had not been analysed.

  • Used Spotify's Web API to fetch Danceability, Energy, Acousticness, Instrumentalness and Speechiness metrics.

  • If a song from the dataset was not on Spotify, I imputed the mean of the feature as the missing value.

  •  

Understanding the data

Low Level Segment Features

Timbre

Pitch

Descriptive Features

Speechiness, Danceability, Tempo, Loudness, Energy, Acousticness, Instrumentalness

 

Notational Features

Key, Mode,

Time Signature

Feature Engineering

  • Square loudness (dB) for interpretability

 

  • Scale energy, tempo, loudness to Gaussian distribution (mean = 0, variance = 1)

 

  • Segment aggregation: Convert segment level 2D information to track level 1D feature

 

  • Key * Mode, Tempo * Mode to capture multiplicative interaction

 

Segment Aggregation

 

 

  • A segment is 0.3 seconds long. Each segment has a pitch and timbre.

  • Pitch: 2D array of Chroma features. The shape varies from (100, 12) to (1600, 12).

 

  • Timbre: 2D array of MFCC features. Shape varies from (100, 12) to (1600, 12).

  • Mel Frequency Cepstral Coefficients (MFCC) captures the logarithmic perception of loudness and pitch as heard by a human.

  • Aggregation: Calculate the min, max, kurtosis, mean, standard deviation, variance of each segment and average over them

 

 

Feature Selection

  • Recursive Feature Elimination with Random Forest Classifier and 5 fold cross validation

  • Used 25 of a possible 52 features

  • Compared feature importance by model

  • Most important descriptive feature was Danceability, followed by Energy, Speechiness and Beats

Modelwise Feature Importance

Model Comparison

Model Features CV score Test accuracy
Random Forest Segment + Desc 73.33 75.44
Segment 71.73 73.13
XGBoost Segment + Desc 73.33 75.24
Segment 71.73 73.10
Gradient Boosting Segment + Desc 72.65 74.39
Segment 71.12 72.87
Extra Trees Segment + Desc 68.44 73.86
Segment 68.14 71.81
SVM Segment + Desc 73.33 73.26
Segment 71.81 69.97

Further Work

  • KNearest Neighbour with Mahalanobis distance

 

  • Exploring whether lyrics can be added as features 

Tools

Visualization: Seaborn, Matplotlib

Models: Scikit-Learn, XGBoost, Pandas, Numpy

Spotify API wrapper: Spotipy

Data Wrangling: SQL

 

Thank You.

Made with Slides.com