CORE Code School
Learn to code from scratch and become a PRO in 10 weeks. Land on your dream job in tech industry.
Marc Pomar - CORE Code School
Nyquist: To record valid audio data, you must ensure your Fs must be at least 2 times the F you want to capture.
Quantisation Error: It is important to choose a proper resolution to ensure you can capture the details in the dynamic range of the signal (peak to peak).
1. Acquire the signal at a uniform sample rate Fs=44100 Hz. Signal is discrete now.
2. Decide the resolution for the Quantization phase. (i.e 16 bits)
3. Process the data!
More about aliasing:
Crazy Helicopter: https://www.youtube.com/watch?time_continue=3&v=yr3ngmRuGUc
Detailed explanation: https://www.youtube.com/watch?v=dNVtMmLlnoE&t=250s
Song being played
(Signal you want to detect)
Noise
Noise can be: people talking, background appliances, rain, etc.
Digital audio captured from microphone
(Your recorded wav file)
Solution: We have to design a system to identify x[k] (our song) that is noise resilient.
Let's move from temporal domain
to frequency domain
The fourier transform
You can find Fourier Transform in the following applications...
Application | FFT Use |
---|---|
5g Communication | Modulate signal (like FM radio) |
TDT Television | Modulate signal & compress video |
JPEG file | Compress image data |
Youtube | Compress video data to save space |
The FFT is a discrete Fourier Transform, you input a signal with N samples and it returns a vector of 2^M samples with the frequency content of the N recorded samples.
Here N and M must be decided:
Example:
Fs=44100, We want the FFT for 1 second (N=44100 samples) (8 bit).
FFT resolution is M=9 (512 samples).
Spectrogram using WebAudio & p5.js
1. Decide a Window Size. (Example: 0.2 seconds)
2. Perform the FFT for each window on the record with overlap between windows of 50% (p=0.5)
A mehtod that allows robust identification in noisy environments.
1st - Find Peaks in Spectrogram
A mehtod that allows robust identification in noisy environments.
2nd - Create Hashes from found peaks
For each peak, find neighborhood peaks, do a hash(F1, F2, t2-t1) and store those hashes in an indexed storage (i.e mysql database).
Hash
MySql Table
Questions? Thank You!! 👏🏻
[DEMO] https://boyander.github.io/inside-shazam/
[CODE] github.com/boyander/identifier-frontend-samplecode
👨🏼💻 > marc@corecode.school
https://www.corecode.school
By CORE Code School
How shazam identifies music
Learn to code from scratch and become a PRO in 10 weeks. Land on your dream job in tech industry.