Teaching Alexa to Listen

[ H/IMA - June 2017]

About Me

Garrett Eastham

Founder & Principal Investigator

  • AI & Digitial Commerce Focus
  • CS @ Stanford
  • Background in Web Analytics
  • Career in Data Science & Product Management
  • Prior: Edgecase (founder), Bazaarvoice, RetailMeNot

Today's Talk

History of Speech Recognition

From Recognition to Understanding

Modern Bot Development

Voice in a World-Wide (Text) World

Speech Recognition

From the 1950s to 2000s

Mobile Actions will be Critical

Events, Entertainment & Media

Framing the (Technical) Challenge

Meaning

Text

Audio

Audio

Text

Speak

Listen

Machine Understanding

Machine Expression

Definition: Speech Recognition

Level of Difficulty

Meaning

Text

Audio

1900s

1950s

1970s

2010s

Teaching a Machine to Speak

https://www.youtube.com/watch?v=0rAyrmm7vv0

Voder: World's First Speech Synthesis
(1939)

Teaching a Machine to Listen

Audrey: World's First Speech Recognition
(1952)

Teaching a Machine to Listen

Audrey: World's First Speech Recognition
(1952)

Could only "recognize" 10 digits (0-9).

Teaching a Machine to Listen

IBM Shoebox
(1962)

10+ years later

Can recognize 6 more terms

Why is this so hard?

Teaching a Machine to Listen

Phonemes ==> Letters

Teaching a Machine to Listen

Hidden Markov Models

Machine Understanding

Teaching a Machine to Understand

Meaning

Text

Audio

Teaching a Machine to Understand

Natural Language Understanding

Dependency Parsing

Semantic Frames

Logical Forms

Teaching a Machine to Understand

Switchboard Transcription Corpus

Better than Humans

Speech Transcription

Error Rates

5.9%

11.3%

Switchboard

CallHome

5.8%

11.0%

Switchboard

CallHome

Human

Machine

Achieving Human Parity in Conversational Speech Recognition (Xiong, J. et al. - 2016)

Teaching a Machine to Understand

WSJ & Penn Treebank Corpus

State of the art results

Part of Speech Tagging

Globally Normalized Transition-Based Neural Networks (Andor et al. - 2016)

Accuracy Rates

97.44%

WSJ

97.77%

Treebank

Defining (Word) Meaning

Word Vectors

Decoding Meaning from Text

"Cynthia sold the bike for $200."

Frame Semantics

Logic Frames

Abstract Concept

Repeatable Scripts

Encoded Facts
​(Knowledge)

Top Down: Frame Semantics

Top Down: Frame Semantics

Top Down: Frame Semantics

Bottom Up: Logical Forms

Natural Language

Universe of Facts

Logical Form

Bottom Up: Logical Forms

1) Enumerate possible derivations of facts.

2) Combine, score and rank possible logical forms.

Modern Bot Development

Language Understanding Partner

Platform API's

Open Source

Intents Provide Needed Constraints

Intent definition offloads knowledge to developer.

Voice in a World-Wide (Text) World

Voice Search Rapidly Growing

Drastic increases in "action" oriented queries.

Move from Keyword to Intent Driven Targeting

Semantic Search will Evolve

Knowledge
​(Extracted Facts)

Dialog Structure
​(Intents)

References
​(Web Pages)

Natural Inquiry
​(Voice Search)

Knowledge (and how to apply it) will supersede content.

This is Not a New Concept

How to Contact Me

garrett@dataexhaust.io

  • Follow up questions
  • Model development / implementation
  • Product team training
  • Moral support

Thank You!

Teaching Alexa to Listen

By Garrett Eastham

Teaching Alexa to Listen

  • 277