A complete overview

of the Shazam app

How Music Identification works

Marc Pomar - CORE Code School

https://www.corecode.school

An intro on how digital audio it's recorded.

Let's create audio data!

https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem

Nyquist: To record valid audio data, you must ensure your Fs must be at least 2 times the F you want to capture.

From Real Value data to binary representation

Quantisation Error: It is important to choose a proper resolution to ensure you can capture the details in the dynamic range of the signal (peak to peak).

Signal Processing crash course

1. Acquire the signal at a uniform sample rate Fs=44100 Hz. Signal is discrete now.

2. Decide the resolution for the Quantization phase. (i.e 16 bits)

3. Process the data!

If you don't capture your data properly.... Aliasing!

More about aliasing:

Crazy Helicopter: https://www.youtube.com/watch?time_continue=3&v=yr3ngmRuGUc

Detailed explanation: https://www.youtube.com/watch?v=dNVtMmLlnoE&t=250s

What happens in the "Real World" when you record audio to be recognized with your Smartphone

Song being played

(Signal you want to detect)

Noise

Noise can be: people talking, background appliances, rain, etc.

Digital audio captured from microphone

(Your recorded wav file)

Solution: We have to design a system to identify x[k] (our song) that is noise resilient.

LEVEL UP 🚀

Let's move from temporal domain

to frequency domain

The fourier transform

\hat{f}(\xi) = \int_{-\infty}^{\infty} f(x)\ e^{-2\pi i x \xi}\,dx

https://en.wikipedia.org/wiki/Fourier_transform

You can find Fourier Transform in the following applications...

Application	FFT Use
5g Communication	Modulate signal (like FM radio)
TDT Television	Modulate signal & compress video
JPEG file	Compress image data
Youtube	Compress video data to save space

Your first FFT, the sinus function.

f(x)=sin(x)

FFT(f(x))= 𝛅(x)

https://es.wikipedia.org/wiki/Delta_de_Dirac

FFT DEMO

https://codepen.io/boyander/full/zXGqey

The FFT is a discrete Fourier Transform, you input a signal with N samples and it returns a vector of 2^M samples with the frequency content of the N recorded samples.

Here N and M must be decided:

Example:

Fs=44100, We want the FFT for 1 second (N=44100 samples) (8 bit).

FFT resolution is M=9 (512 samples).