What is Speech
Given speech audio, generate a transcript
Speech Recognizer
H
e
l
l
o
w
o
r
l
d
Important goal of AI: Historically hard for machines, easy for people
Traditional systems break the problem into several key components:
Audio Wave
Feature representation
Decoder
HMM/WFST)
Acoustic Model
Language model
Gales & Young, 2008
Jurafsky & Martin, 2000
Usually represent words as a sequence of "phonemes"
Phenomes are the distinct units of sound that distinguish words
Dahl et al. 2011
Sound Data
Sound is transmitted as waves. How do we turn waves into numbers?
A wave form of "Hello"
Sound waves are one-dimensional.
At every moment in time, they have a single value based on the height of the wave.
Let’s zoom in on one tiny part of the sound wave and take a look:
To turn this sound wave into numbers, we just record of the height of the wave at equally-spaced points:
This is called Sampling
Two ways to start:
A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time
Fourier Transform
Breaks apart the complex sound wave into the simple sound waves that make it up. Once we have those individual sound waves, we add up how much energy is contained in each one.
The end result is a score of how important each frequency range is, from low pitch (i.e. bass notes) to high pitch
Concatenate frames from adjacent windows to form a "spectogram"
Over 300 radio stations are registered in Uganda.
In its 2015 third quarter report, the regulator, UCC
revealed that 292 FM stations were operational
Goal:
To build an iterable Keyword spotting model for Automated Crop Disease and Pest Surveillance in Uganda from radio data
Pipeline
Dataset consists of keywords that occur often in Agricultural talkshows
Next step is to Obtain Audio waveforms of Each Keyword
If you'd like to Donate your voice:
https://rcrops.github.io
Conclusion
akeraben@gmail.com