Founder & Principal Investigator
From the 1950s to 2000s
Events, Entertainment & Media
Meaning
Text
Audio
Audio
Text
Speak
Listen
Machine Understanding
Machine Expression
Level of Difficulty
Meaning
Text
Audio
1900s
1950s
1970s
2010s
https://www.youtube.com/watch?v=0rAyrmm7vv0
Voder: World's First Speech Synthesis
(1939)
Audrey: World's First Speech Recognition
(1952)
Audrey: World's First Speech Recognition
(1952)
Could only "recognize" 10 digits (0-9).
IBM Shoebox
(1962)
10+ years later
Can recognize 6 more terms
Why is this so hard?
Phonemes ==> Letters
Hidden Markov Models
Meaning
Text
Audio
Natural Language Understanding
Dependency Parsing
Semantic Frames
Logical Forms
Switchboard Transcription Corpus
Better than Humans
Speech Transcription
Error Rates
5.9%
11.3%
Switchboard
CallHome
5.8%
11.0%
Switchboard
CallHome
Human
Machine
Achieving Human Parity in Conversational Speech Recognition (Xiong, J. et al. - 2016)
WSJ & Penn Treebank Corpus
State of the art results
Part of Speech Tagging
Globally Normalized Transition-Based Neural Networks (Andor et al. - 2016)
Accuracy Rates
97.44%
WSJ
97.77%
Treebank
Word Vectors
"Cynthia sold the bike for $200."
Frame Semantics
Logic Frames
Abstract Concept
Repeatable Scripts
Encoded Facts
(Knowledge)
Natural Language
Universe of Facts
Logical Form
1) Enumerate possible derivations of facts.
2) Combine, score and rank possible logical forms.
Platform API's
Open Source
Intent definition offloads knowledge to developer.
Drastic increases in "action" oriented queries.
Move from Keyword to Intent Driven Targeting
Knowledge
(Extracted Facts)
Dialog Structure
(Intents)
References
(Web Pages)
Natural Inquiry
(Voice Search)
Knowledge (and how to apply it) will supersede content.