ots and tts

Text To Speech?

or...

Text And Speech?

not your daddy's karaoke machine

TTS... huh, what's it good for?

Audio playback of item prompts, options, and other text content with optional highlighting
Additional TTS content describes item graph axes, diagrams and other graphics
Tools and buttons have tooltips and also play the audio for tooltip content
Page playback begins automatically on each page
Options: volume and playback rate (speed up/down)

The Java Client

Audio generated on-demand by Java client
No time spend downloading audio
Dynamic content: name, school, test; they all
work (mostly) fine.
Volume control? easy.
Playback rate? puh-leeze. Easy!

The web-based Test Client

Everything's the same! (not)

All audio is pre-recorded (mp3), nothing on-the-fly
You have to download that stuff.
Volume control? Still easy.
Playback Rate? Well...

...not so much.

More like changing the speed of a record player.

Slow down audio playback...

... or speed it up?

Enter Webkit's Web Audio Context API

Google for Web Audio API from w3.org
It's Not HTML5 Audio
- Not very scriptable.
- No event hooks
AudioContext: new WebAudioContext()
- DSP experiments and effects created by connecting multiple nodes in an "audio graph"
See 'Granular Effects' demo that was the initial seed.

Granular effects

Granular = audio 'grains'

Take tiny slices of audio lasting only milliseconds, and play them sequentially.

If you had 10 seconds (10,000ms) of audio, you could play it as 200 sequential grains each of length 50ms, scheduling them to play 50ms apart.

G0 = 0ms

G1 = 50ms

G2 = 100ms

G3 = 150ms

...

...But what about playing Faster?

Suppose our grains were still 50ms in duration... but we scheduled them to play back closer together?

What if we scheduled a new 50ms grain to play back every... 40ms?

The audio grain playback would overlap slightly, but the ear would hear a "faster" playback.

...And Slower?

Maybe a tad more complicated.

We may actually need to schedule playback of the very same audio grain multiple times, a few ms apart. Does it work? Mostly...

Distorting the time & space continuum

Performing DSP (digital signal processing) effects with this Javascript library is possible, but with caveats:

grains are scheduled with setTimeout() and timing in Javascript is no picnic.
grain size, desired playback rate and playback duration have to be coordinated unless you don't mind
- flange effects
- echoes
- distortion
- evil-robot voices
current constants used in SynchronizedPlayer were obtained using black magic and lots of coffee...

Okay, so how does TTS

work in the WBTE?

Very well, thank you.

Pre-recorded MP3

Test form items consist of many pieces.

prompt
multiple-choice options
scoregroups
passages
others?

Each one has its own (X|HT)ML-ish markup content. Each of these pieces of markup has been fed to TTS generation tools residing at DRC which will generate an MP3:

12345_prompt.mp3, 12345_option0.mp3, 12345_option1.mp3, etc.

Magical Metadata!

...But that's not all!

Each MP3 is accompanied by a text file (12345_prompt.txt) which contains JSON data like this:


    [
     [<time in MS>,<string offset location>,<string length>],
     [0.028,12,7],
     [0.287,22,19],
     ...
    ]

Combine this with the original markup and the MP3. Then we know when a word is spoken, and where in the markup to find it (and thus highlight it).

Non-form TTS Stuff

There's audio for non-form stuff:

"Funnel" pages
static audio ("You have selected...")
tooltips with tts enabled will play audio

Audio Sprites

Single audio file containing many words or phrases

Sprite metadata indexes keywords
["begin test",0.028,1.213]

So what's the problem?

Javascript's single-threaded model & setTimeout()
- Start playback and then switch tabs, or minimize...
DSP ain't cheap; try the IPad.
Again with timing: some audio sprite grain scheduling gets... funky. Try mousing over "Periodic Table" button.
And timing, again... the spacing of some tokens (highlighted words) is so close that they would get skipped during playback.
Oi, again with the timing... currently we sometimes see token highlighting going on "fast forward".
Go look at the JIRA board for TTS-related bugs.
TTS is really hard to test. Seriously.

What's left to do?

Test Directions, Help Contents, Passages, other?

What could be done better?

Tokenization by Form Services?
- No client-side tokenization effort
- String offsets not needed; metadata could become much smaller.
Alter form JSONs to indicate if/when audio is present; this way only form items which actually require MP3 content are attempted to be retrieved (hard to guess based on markup)

You're breaking the girl

Developers alter code that touches TTS, but they don't know it. We have regressions, too, bro.

JATH parsing only grabbed text which excluded all tokenized markup. Result? whitespace.
```
<span class="tts-token">like this</span>
```
App.bootstraps were designed to block render()
Log into the math99 test with some accommodations; see (and hear) the test in a whole new light.
```
kristts# or krisboth#
```