grain size, desired playback rate and playback duration have to be coordinated unless you don't mind
flange effects
echoes
distortion
evil-robot voices
current constants used in SynchronizedPlayer were obtained using black magic and lots of coffee...
Okay, so how does TTS
work in the WBTE?
Very well, thank you.
Pre-recorded MP3
Test form items consist of many pieces.
prompt
multiple-choice options
scoregroups
passages
others?
Each one has its own (X|HT)ML-ish markup content. Each of these pieces of markup has been fed to TTS generation tools residing at DRC which will generate an MP3:
12345_prompt.mp3, 12345_option0.mp3, 12345_option1.mp3, etc.
Magical Metadata!
...But that's not all!
Each MP3 is accompanied by a text file (12345_prompt.txt) which contains JSON data like this:
Combine this with the original markup and the MP3. Then we know when a word is spoken, and where in the markup to find it (and thus highlight it).
Non-form TTS Stuff
There's audio for non-form stuff:
"Funnel" pages
static audio ("You have selected...")
tooltips with tts enabled will play audio
Audio Sprites
Single audio file containing many words or phrases
Sprite metadata indexes keywords
["begin test",0.028,1.213]
So what's the problem?
Javascript's single-threaded model & setTimeout()
Start playback and then switch tabs, or minimize...
DSP ain't cheap; try the IPad.
Again with timing: some audio sprite grain scheduling gets... funky. Try mousing over "Periodic Table" button.
And timing, again... the spacing of some tokens (highlighted words) is so close that they would get skipped during playback.
Oi, again with the timing... currently we sometimes see token highlighting going on "fast forward".
Go look at the JIRA board for TTS-related bugs.
TTS is really hard to test. Seriously.
What's left to do?
Test Directions, Help Contents, Passages, other?
What could be done better?
Tokenization by Form Services?
No client-side tokenization effort
String offsets not needed; metadata could become much smaller.
Alter form JSONs to indicate if/when audio is present; this way only form items which actually require MP3 content are attempted to be retrieved (hard to guess based on markup)
You're breaking the girl
Developers alter code that touches TTS, but they don't know it. We have regressions, too, bro.
JATH parsing only grabbed text which excluded all tokenized markup. Result? whitespace.
<span class="tts-token">like this</span>
App.bootstraps were designed to block render()
Log into the math99 test with some accommodations; see (and hear) the test in a whole new light.