![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596630/pgs-slide-main.png)
Web Audio API
How to transmit data over sound
Robert Rypuła
2018.05.17
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Presentation overview:
- What is sound wave and how to store it?
- Web Audio API overview
- Generate sound - speakers
- Hear the sound - microphone
- AnalyserNode
- Modulation techniques
- Spectral Waterfall
- Physical Layer
- Data Link Layer
- Where to find more?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
What is sound wave and
how to store it?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Speed of sound: 320 - 350 m/s (~ 1200 km/h)
Radio wave is almost milion times faster!
- 50 Hz --> 6.7 m
- 1 000 Hz --> 33.5 cm
- 15 000 Hz --> 2.2 cm
Sound wave in air travels as local changes in pressure
Wave length examples:
Speaker and mic needs to have moving part
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4881127/jetPlane.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4606411/speaker.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4611017/mic.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4881148/car.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4881155/latop.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4881165/paperclip.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Human hearing range: 20 Hz - 20 kHz
FM radio: 30 Hz - 15 kHz
Text
Whistling: 1 kHz - 2.5 kHz
Enough for voice: 300 Hz - 3400 Hz
Web Audio API + mobile phone: 150 Hz - 6000 Hz
0 Hz
10 kHz
20 kHz
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Common Sampling Rates:
Text
- 44.1 kHz
- 48.0 kHz (almost all phones)
Common sample value precision: 16 bit signed integer
Range: -32768 do 32767
In Web Audio API it's normalized to: -1, +1
How to store sound wave?
Take a sample in constant interval
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4602093/02-sound-wave-one-period.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Text
Humans hear up to 20 kHz!
Why we use such high sampling rate?
Sampling frequency needs to be at least 2x higher than maximum frequency in the signal
44.1 kHz / 2 = 22 050 Hz (2 050 Hz for filtering)
48.0 kHz / 2 = 24 000 Hz (4 000 Hz for filtering)
0 Hz
20 kHz
44.1 kHz
48 kHz
?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Text
Text
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596734/03-nyquist-1.png)
first 3 frames
out of 10
light line - what we want to store
dark line - what we will recover
fake frequencies
real frequencies
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Text
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596733/03-nyquist-2.png)
Sample rate assumed in those two slides: 10 Hz
Maximum frequency: 5 Hz
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Web Audio API
overview
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Audio Nodes live inside Audio Context
Very elastic - we are just connecting nodes as we like
AudioContext
AudioContext
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Input nodes:
-
OscillatorNode
Example: generates waves -
AudioBufferSourceNode
Example: reading audio files -
MediaStreamAudioSourceNode
Example: access microphone stream
Output node:
-
AudioDestinationNode
Example: final node that represents speakers
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4606411/speaker.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4611017/mic.png)
~
10111
00101
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Effect nodes:
-
BiquadFilterNode
Filters: lowpass, high pass, bandpass -
DelayNode
Add delay to the stream -
GainNode
Volume control
Visualization/processing nodes:
- AnalyserNode
gives time/frequency domain data -
ScriptProcessorNode
gives arrays of subsequent samples
DSP
DSP
DSP
FFT
freqDomain
timeDomain
your code
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Generate sound - speakers
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
OscillatorNode
oscNode = audioContext.createOscillator();
GainNode
gainNode = audioContext.createGain();
AudioDestinationNode
adNode = audioContext.destination;
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4606411/speaker.png)
connect()
connect()
AudioContext
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
audioContext = new AudioContext();
oscillatorNode = audioContext.createOscillator();
gainNode = audioContext.createGain();
oscillatorNode.start(); // <-- required!
oscillatorNode.connect(gainNode);
gainNode.connect(audioContext.destination);
<input
type="range" min="0" max="20000" value="440"
onChange="oscillatorNode.frequency
.setValueAtTime(this.value, audioContext.currentTime)"
/>
<input
type="range" min="0" max="1" value="1" step="0.01"
onChange="gainNode.gain
.setValueAtTime(this.value, audioContext.currentTime)"
/>
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Hear the sound - microphone
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
To access microphone we need to have
user permission
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4611000/mic-access.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4611004/recording.png)
Only websites served
via https can access microphone
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
MediaStreamSourceNode
micNode = audioContext.createMediaStreamSource(stream);
GainNode
gainNode = audioContext.createGain();
AudioDestinationNode
adNode = audioContext.destination;
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4606411/speaker.png)
connect()
connect()
AudioContext
navigator.mediaDevices.getUserMedia(constraints) .then(function (stream) { });
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4611017/mic.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
var microphoneNode; // <-- IMPORTANT, declare variable outside promise
function connectMicrophoneTo(audioNode) {
var constraints = {
video: false,
audio: true
};
navigator.mediaDevices.getUserMedia(constraints)
.then(function (stream) {
microphoneNode = audioContext.createMediaStreamSource(stream);
microphoneNode.connect(audioNode);
})
.catch(function (error) {
alert(error);
});
}
// ...
function init() {
audioContext = new AudioContext();
gainNode = audioContext.createGain();
connectMicrophoneTo(gainNode);
gainNode.connect(audioContext.destination);
// ...
}
WARNING:
Microphone connected to speakers. Audio feedback
may occur!!
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
AnalyserNode
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Allows to look at the same audio signal in
two different ways
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4633548/frequency-domain.png)
Frequency Domain:
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4886187/01-sound-wave_v2.png)
'fftSize' values
'fftSize/2' values
Time domain:
Here it is!!
It looks like signal has one dominant frequency
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4633548/frequency-domain.png)
0 dB
-20 dB
-40 dB
-60 dB
-80 dB
-100 dB
-120 dB
'0.5*fftSize' frequency bins
Frequency domain:
AnalyserNode performs Discrete Fourier Transform (Fast Fourier Transform, FFT)
'X' time domain samples are transformed into 'X/2' frequency bins
Frequency bin is expressed in decibels: every 20 dB we have 10x difference in amplitude
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
MediaStreamSourceNode
micNode = audioContext.createMediaStreamSource(stream);
AnalyserNode
analyserNode = audioContext.createAnalyser();
connect()
AudioContext
navigator.mediaDevices.getUserMedia(constraints) .then(function (stream) { });
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4611017/mic.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
function init() {
audioContext = new AudioContext();
analyserNode = audioContext.createAnalyser();
analyserNode.fftSize = 1024; // <--- fftSize, only powers of two!
analyserNode.smoothingTimeConstant = 0.8;
connectMicrophoneTo(analyserNode);
}
function getTimeDomainData() {
var data = new Float32Array(analyserNode.fftSize);
analyserNode.getFloatTimeDomainData(data);
return data; // ----> array length: fftSize
}
function getFrequencyData() {
var data = new Float32Array(analyserNode.frequencyBinCount);
analyserNode.getFloatFrequencyData(data);
return data; // ----> array length: fftSize / 2
}
How to use Analyser Node
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
- better resolution -> bigger fftSize
- bigger fftSize -> more time domain samples
- more time domain samples -> longer it takes to collect them
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4918719/fft-res_small.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4918718/fft-res_big.png)
fftSize
fftSize
interval length
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4918640/1_5wY5lY_h9g_-Ohf2Q9LK2Q.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
function getTimeDomainDuration() {
return analyserNode.fftSize / audioContext.sampleRate; // [ms]
}
function getFftResolution() {
return audioContext.sampleRate / analyserNode.fftSize; // [Hz/freqBin]
}
function getFrequency(fftBinIndex) {
return fftBinIndex * getFftResolution(); // [Hz]
}
Frequency domain output interpretation
is based on sampleRate as well
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
My tests:
Slowest mobile device was able to deliver 4 unique frequency domain outputs per second.
The best fit is fftSize = 8192
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4656442/fft-parameters.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Discrete Fourier Transform sandbox ;)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Modulation techniques
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Let's assume we want to send 4 different symbols
How to use wave properties to send them?
?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4887491/ASK.png)
Amplitude-Shift Keying
Good only for few symbols
With more symbols they will be to
close to each other
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Phase-Shift Keying
Phase data not available in Web Audio API
Not so many symbols per carrier wave
Requires custom-made DSP code
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4887493/PSK.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Frequency-Shift Keying
Out of the box with Analyser Node
More symbols?
- Expand the bandwidth!
- Increase FFT resulution!
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4887492/FSK.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Spectral Waterfall
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
How to show frequency domain changes in one row?
Add colors!
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4633548/frequency-domain.png)
0 dB
-40 dB
-80 dB
-120 dB
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4889515/spectral-waterfall.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Desktop vs mobile
Most mobile devices blocks all frequencies above 8 kHz
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4889527/09-spectral-waterfall.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4889572/skip_main.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4889571/skip_5.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4889570/skip_3.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4889572/skip_main.png)
Skip factor 3
Skip factor 5
How to make single frequency more visible?
Merge frequency bins by finding max in each group
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Where to allocate our symbols?
0 Hz
10 kHz
20 kHz
filtered out by most mobile phones
too low for mobile phone speaker
~800 frequency bins
Assume fftSize equal 8192
skip factor 3 (3 bins merged into 1)
~265 frequency bins
One byte
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4918993/chuck-norris.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Summary
- Modulation: Frequency-Shift Keying
- FFT size: 8192
- Window duration: 186 ms or 171 ms
- Smoothing: disabled
- Freq. bins skiping: 3 merged into 1
- Bandwidth: from ~1.5 kHz to ~6.0 kHz
- Symbols: ~256 (1 byte per each FFT)
- Raw symbol rate: 4 FFTs/s
Works on most devices and browsers
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Physical Layer
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
We can get 4 FFTs per second, each containing 1 byte
Unfortunately useful symbol rate will be only 2 bytes / s
H
e
l
l
o
Why?
Rx1
Rx2
0 s
0.5 s
1.0 s
1.5 s
2.0 s
2.5 s
Rx3
Tx
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Synchronization process
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4893662/sync-details.png)
Sync in
progress
Waiting
for sync
Sync OK
0 s
1 s
2 s
3 s
4 s
Goal:
Find "sync" sequence
with highest
signal strength
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Physical Layer synchronization example:
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
0 Hz
10 kHz
20 kHz
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4893702/PHY.png)
It takes 5.5 seconds to send "hello world"
ASCII range
256 data symbols + 2 sync symbols
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
More Physical Layer examples:
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Data Link Layer
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Physical Layer is not checking errors
Solution - pack chunks of data into frames!
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4893828/DL-frame.png)
Payload
Header
Checksum
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Fletcher-8 checksum sum0 = (sum0 + halfOfByte) % 0x0F; sum1 = (sum1 + sum0) % 0x0F;
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4893829/DL-header.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4893828/DL-frame.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4893827/DL-checksum.png)
Header
"marker"
Header
length
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Data Link example:
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Where to find more?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4895115/Programista_czesc01.png)
Part 1/3
Discrete Fourier Tranform
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4895112/Programista_czesc02.png)
Part 2/3
Web Audio API
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4895113/Programista_czesc03.png)
Part 3/3
Self-made network stack
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4895091/github.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4895097/shrek_cat.png)
If you liked it please give
a star on Github here
Audio Network
project website
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4895790/audio-network-website-en.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/818975/images/4596635/pgs-slide.png)
Thanks for watching!
the end
Web Audio API - Data transmission over sound
By Robert Rypuła
Web Audio API - Data transmission over sound
- 3,370