Teaching Alexa to Listen
[ H/IMA - June 2017]
About Me
Garrett Eastham

Founder & Principal Investigator
- AI & Digitial Commerce Focus
- CS @ Stanford
- Background in Web Analytics
- Career in Data Science & Product Management
- Prior: Edgecase (founder), Bazaarvoice, RetailMeNot

Today's Talk
History of Speech Recognition
From Recognition to Understanding
Modern Bot Development
Voice in a World-Wide (Text) World
Speech Recognition
From the 1950s to 2000s
Mobile Actions will be Critical

Events, Entertainment & Media
Framing the (Technical) Challenge
Meaning
Text
Audio
Audio
Text


Speak
Listen
Machine Understanding
Machine Expression



Definition: Speech Recognition
Level of Difficulty
Meaning
Text
Audio

1900s
1950s
1970s
2010s
Teaching a Machine to Speak
https://www.youtube.com/watch?v=0rAyrmm7vv0

Voder: World's First Speech Synthesis
(1939)
Teaching a Machine to Listen

Audrey: World's First Speech Recognition
(1952)
Teaching a Machine to Listen

Audrey: World's First Speech Recognition
(1952)
Could only "recognize" 10 digits (0-9).
Teaching a Machine to Listen

IBM Shoebox
(1962)
10+ years later
Can recognize 6 more terms
Why is this so hard?
Teaching a Machine to Listen
Phonemes ==> Letters

Teaching a Machine to Listen
Hidden Markov Models

Machine Understanding
Teaching a Machine to Understand
Meaning
Text
Audio
Teaching a Machine to Understand
Natural Language Understanding

Dependency Parsing
Semantic Frames


Logical Forms
Teaching a Machine to Understand



Switchboard Transcription Corpus
Better than Humans
Speech Transcription
Error Rates
5.9%
11.3%
Switchboard
CallHome
5.8%
11.0%
Switchboard
CallHome
Human
Machine
Achieving Human Parity in Conversational Speech Recognition (Xiong, J. et al. - 2016)
Teaching a Machine to Understand


WSJ & Penn Treebank Corpus
State of the art results
Part of Speech Tagging
Globally Normalized Transition-Based Neural Networks (Andor et al. - 2016)

Accuracy Rates
97.44%
WSJ
97.77%
Treebank
Defining (Word) Meaning
Word Vectors

Decoding Meaning from Text
"Cynthia sold the bike for $200."
Frame Semantics
Logic Frames
Abstract Concept
Repeatable Scripts
Encoded Facts
(Knowledge)
Top Down: Frame Semantics

Top Down: Frame Semantics


Top Down: Frame Semantics


Bottom Up: Logical Forms

Natural Language
Universe of Facts
Logical Form
Bottom Up: Logical Forms

1) Enumerate possible derivations of facts.
2) Combine, score and rank possible logical forms.
Modern Bot Development
Language Understanding Partner
Platform API's
Open Source







Intents Provide Needed Constraints

Intent definition offloads knowledge to developer.
Voice in a World-Wide (Text) World
Voice Search Rapidly Growing

Drastic increases in "action" oriented queries.
Move from Keyword to Intent Driven Targeting
Semantic Search will Evolve
Knowledge
(Extracted Facts)
Dialog Structure
(Intents)
References
(Web Pages)
Natural Inquiry
(Voice Search)

Knowledge (and how to apply it) will supersede content.
This is Not a New Concept

How to Contact Me
garrett@dataexhaust.io
- Follow up questions
- Model development / implementation
- Product team training
- Moral support
Thank You!
Teaching Alexa to Listen
By Garrett Eastham
Teaching Alexa to Listen
- 276