Web Speech:
Past, Present,
Future and Opportunity

@briankardell

Interesting historical note

There's history of us trying to make

machines that 'speak' or 'listen' going

back to at least the 1700s

Bell Labs demonstrated digital voice
synthesis at the World's Fair in 1939

If you find this interesting, I wrote a whole
piece on The History of Speech

Who remembers this movie?

1983

Me in 1983:

Me...

... I was easily distracted.

The Web: Early 90s - the web really brought me back to computers

<object ID="AgentControl" width="0" height="0" CLASSID="clsid:D45FD31B-5C6E-11D1-9EC1-00C04FD7081F" CODEBASE="http://server/path/msagent.exe#VERSION=2,0,0,0">
</object>

1997!!!

Me in 1997:

WebSpeech

The path to the current state of affairs...

1998 CSS2 Aural Stylesheets

March 1999
AT&T Corporation, IBM, Lucent, and Motorola formed the VoiceXML Forum

Handed over to W3C in 2000
AskJeves, AT&T, Avaya, BT, Canon, Cisco, France Telecon, General Magic, Hitachi, HP, IBM, isSound, Intel, Locus Dialogue, Lucent, Microsoft, Mitre, Motorola, Nokia, Nortel, Nuance, Phillips, PipeBeach, Speech Works, Sun, Telecon Italia, TellMe.com, and Unisys

Mayyyybbbeeee?

Like HTML, but for... wait... I'm confused.

So many XMLs

For the next decade...

2010: Speech XG Community Group

let's sidestep this mess...

Competing draft proposals from:

Google,
Microsoft,
Mozilla
Voxeo

Stir....

~ kinda a spec

Next stop:
Working Group

LOL jk

2012: Another Community Group!

~"Here's what we're shipping"

Buuuuuttt....

The Web: Present Day

Almost :(

The Web: Present Day

Parts of it work everywhere (almost, better than many 'standards' in history)
No RECs here, or even earlier. Not standards tracked
The actual de-facto standard isn't written down
Lots of partial successes
- Lots of interest from developers!
- No-one is prioritizing this

I wrote some posts: Web Speech (series)

TTS Code/Demos?

STT Code/Demos?

It is historically weird (like MathML)
Here's how it works (and doesn't)
It needs resolution and attention to resolve its weirdness and architecture.. But nobody is doing it
The world is quite a bit different too
I have some thoughts about how to resolve this

We should ultimately have a well explained, layered architecture with escape hatches

A promise based API
Feature detection
A sensible voice model for TTS, like fonts.
Constructors/A replaceable service provider

Lots of service APIs

Even at the OS level!

Lots of discussion, but no driver.

So what's holding it up?

It lacks a driver - especially one without an interest in both ends of it

A few proposals.

Basic support for Linux would be great for WebKit in general :(

Opportunity and advantage Igalia?

Voice notifications?

Hands free assistance? (keep that screen clean)

"Play Frozen for the Kids in the back"

"Turn on/off/up the AC/Heat"

Voice notifications?
"You're due for an oil change soon"

WPE could have advantage with Speech.

Igalia could position as a strong driver at fixing historical problems...

Lots of potential funding strategies here potentially.

Thanks.

Web Speech APIs now

By Brian Kardell

Web Speech APIs now

Brian Kardell

bkardell.com

Web Speech: Past, Present, Future and Opportunity

Who remembers this movie?

Me...

WebSpeech

2010: Speech XG Community Group let's sidestep this mess...

~ kinda a spec

Next stop: Working Group

LOL jk

2012: Another Community Group!

~"Here's what we're shipping"

Buuuuuttt....

The Web: Present Day

Almost :(

The Web: Present Day

TTS Code/Demos?

STT Code/Demos?

We should ultimately have a well explained, layered architecture with escape hatches

A promise based API

Feature detection

A sensible voice model for TTS, like fonts.

Constructors/A replaceable service provider

Lots of service APIs

Even at the OS level!

Lots of discussion, but no driver.

So what's holding it up?

It lacks a driver - especially one without an interest in both ends of it

A few proposals.

Basic support for Linux would be great for WebKit in general :(

Opportunity and advantage Igalia?

Web Speech APIs now

More from Brian Kardell

Web Speech:
Past, Present,
Future and Opportunity

2010: Speech XG Community Group

let's sidestep this mess...

Next stop:
Working Group