A complete overview

of the Shazam app

How Music Identification works

Marc Pomar - CORE Code School

https://www.corecode.school

An intro on how digital audio it's recorded.

Let's create audio data!

Nyquist: To record valid audio data, you must ensure your Fs must be at least 2 times the F you want to capture.

From Real Value data to binary representation

Quantisation Error: It is important to choose a proper resolution to ensure you can capture the details in the dynamic range of the signal (peak to peak).

Signal Processing crash course

1. Acquire the signal at a uniform sample rate Fs=44100 Hz. Signal is discrete now.

2. Decide the resolution for the Quantization phase. (i.e 16 bits)

3.  Process the data!

If you don't capture your data properly.... Aliasing!

 

What happens in the "Real World" when you record audio to be recognized with your Smartphone

Song being played

(Signal you want to detect)

Noise

Noise can be: people talking, background appliances, rain, etc.

 

Digital audio captured from microphone

(Your recorded wav file)

Solution: We have to design a system to identify x[k] (our song) that is noise resilient.

LEVEL UP 🚀

Let's move from temporal domain

to frequency domain

 

The fourier transform

\hat{f}(\xi) = \int_{-\infty}^{\infty} f(x)\ e^{-2\pi i x \xi}\,dx

You can find Fourier Transform in the following applications...

Application FFT Use
5g Communication Modulate signal (like FM radio)
TDT Television Modulate signal & compress video
JPEG file Compress image data
Youtube Compress video data to save space

Your first FFT, the sinus function.

 f(x)=sin(x)

 FFT(f(x))= 𝛅(x)

The FFT is a discrete Fourier Transform, you input a signal with N samples and it returns a vector of 2^M samples with the frequency content of the N recorded samples.

Here N and M must be decided:

Example:

Fs=44100, We want the FFT for 1 second (N=44100 samples) (8 bit).

FFT resolution is M=9 (512 samples).

Spectrogram from FFT data

Spectrogram using WebAudio & p5.js

https://codepen.io/boyander/full/NmqRJq

Capture the spectrogram.

 

1. Decide a Window Size. (Example: 0.2 seconds)

2. Perform the FFT for each window on the record with overlap between windows of 50% (p=0.5)

Fingerprinting

A mehtod that allows robust identification in noisy environments.

1st - Find Peaks in Spectrogram

Fingerprinting

A mehtod that allows robust identification in noisy environments.

2nd - Create Hashes from found peaks

For each peak, find neighborhood peaks, do a hash(F1, F2, t2-t1) and store those hashes in an indexed storage (i.e mysql database).

Hash

MySql Table

Now, repeat the process with your new audio sample and search for the same hashes in the Database

 

DEMO TIME

 

https://boyander.github.io/inside-shazam/

Questions?  Thank You!! 👏🏻

 

[DEMO] https://boyander.github.io/inside-shazam/

[CODE] github.com/boyander/identifier-frontend-samplecode

 

 

👨🏼‍💻 > marc@corecode.school

https://www.corecode.school

Made with Slides.com