Feasibility of augmenting text with visual prosodic cues to enhance
oral reading
Rupal Patel, Heather Kember, Sara Natale
Speech Communication (2014)
Sheng-Fu @2015/02/26
Introduction
Role of prosody in reading
- "Reading with expression" is critical to oral reading fluency
- "Reading with expression" requires modulation of prosody - rhythm and melody of speech.
- Speakers use F0, duration, intensity, voice quality to convey linguistic and affective goals (Xu, 1999, 2011)
- Listeners use prosodic cues to segment speech into units that supports working memory and comprehension.
Hypothesis on Explicit visual cues to prosody
- Providing explicit visual cues to prosody would improve reading fluency in beginning readers.
- Beginning readers struggle with prosody when reading aloud
- Lack of sufficient cues in written texts
- Developmentally challenging as it is simultaneous with reading acquisition
- Beginning readers struggle with prosody when reading aloud
Visual cues to prosody
-
Previous studies/efforts:
- Intra-sentence spacing (e.g., Levasseur et al. 2006)
- Punctuation manipulation and font case (Blevins, 2001)
- Yet, prosodic variation in natural speech cannot be captured by simple mappings with such formatting
-
The ReadN'Karaoke software (Patel and McNab, 2011)
-
providing visual and auditory prosodic cues to pitch, duration, intensity
-
Visual: from manipulated texts to augmented texts
-
Manipulated vs. Augmented texts
Present study: effects of augmented cues
- To assess ReadN'Karaoke 2.0 with beginning readers using a standardized story.
- Hypotheses:
- more prosodic change with visual cues
- improved reading comprehension with visual cues
Method
Participants
- Eight typically developing children
- 3M, 5F
- Mean = 8.06 years, SD = 0.36, range = 7;7-8;4
- native speakers of AE
- no reading, speech-language, or hearing problems
- controlled reading level (2nd grade)
Materials
-
Baseline and post-training test: a five-chapter story
-
approximately 30 sentences per chapter
- at least 3 occurrences of six contour types per chapter
- only visual cues are provided
-
- Training: some published children's book adapted to include targeted prosodic contrasts
- visual + auditory model are provided
- Augmented texts were generated semi-automatically from recordings of a fluent adult reader.
- Text displayed in five formats: Standard, Pitch (F0 contour), Duration (brackets and spacing), Intensity (mass of vertical pulses), P+D+I
ReadN'Karaoke 2.0
Procedure
- Baseline session (first week):
- recording the 5-chapter story without prosodic cues
- comprehension tests
- Training sessions (2nd to 4th week):
- Each cue was trained individually in 1st and 2nd sessions
- combined cues in 3rd session
- cue-mapping quiz prior to 2nd and 3rd sessions
- Post-training session (5th week):
- cue-mapping quiz
- recording the 5-chapter story, with visual cues only
- comprehension tests (same questions)
Acoustic analysis
- Target words: words within and surrounding each prosodic contour
- e.g., Declarative sentence: final word
- e.g., Contrastive stress: word with stress
- Cues measured:
- F0 peak
- Intensity
- Word Duration
- Pause Duration
- speech rate: words/second
Results
Word duration
- Only including Contrastive Stress and Phrase-Final lengthening data
- Main effects of testing time and cue condition
- Significant testing time x cue condition interaction
- post-training has longer duration, especially in Duration and Combination conditions
Pause duration
- Only including Noun/Adj list and Phrase-Final lengthening data
- No main effect of testing time
- Significant testing time x cue condition interaction
- post-training has longer pause duration in the durational cue condition
Peak pitch
- Only including Contrastive Stress, Exclamatory Stress, and Phrase-Final Lengthening data
- Significant main effect of testing time
- post-training has longer pause duration in the durational cue condition
Speech rate
- Significant testing time x cue condition interaction
- Faster post-training speech rate in the standard condition
No significant results for...
- Mean intensity
- Reading comprehension
Discussion and Conclusion
Summary of results
- Post-training:
- Longer word duration and higher F0 peak regardless of the cue condition
- Longer pause duration only in the isolated pause cue condition
- Faster speech rate only in the standard condition
- No effects found for intensity, comprehension score
The effect of pitch cues
- Generalizability after brief training period
- overall post-training improvement
- Pitch modulation is learned and used for various types of sentences
The null effect of intensity cues
- Developmental explanation:
- Intensity control is a later developing skill (Stathopoulos and Sapienza, 1997)
- Survey showed that participants understood the information conveyed with the intensity cue
Reading comprehension
- The null results are likely caused by ceiling effect
- The questions may be too easy for the participants
- At least visual cues do not distract or make it harder for the participants
- (but it's the same questions...)
Speech rate
- Increased speech rate in the standard (non-cue) condition
- familiarity
- trying to employ cues slowed down the speech rate
Usability survey
- All participants liked the software and the materials!
- enjoyed "figuring the cues out"
- liked "focusing on the words and saying them how you're supposed to"
- learned to "focus on important words"
- Participants' feedback for future use
- the combination condition is confusing
- not enough training on individual cue
-
Conclusion
- Supplementing text with visual prosodic cues enhances oral reading expressivity in early readers.
- Future work (for the method/software)
- extended training paradigm for long term learning
- gradually diminishing cues to standard texts only
- L2, e.g., learning English prosody
Feasibility of augmenting text with visual prosodic cues to enhance oral reading
By sftwang0416
Feasibility of augmenting text with visual prosodic cues to enhance oral reading
- 651