Feasibility of augmenting text with visual prosodic cues to enhance
oral reading

Rupal Patel, Heather Kember, Sara Natale

Speech Communication (2014)

Sheng-Fu @2015/02/26

Introduction

Role of prosody in reading

  • "Reading with expression" is critical to oral reading fluency   
  • "Reading with expression" requires modulation of prosody - rhythm and melody of speech.
  • Speakers use F0, duration, intensity, voice quality to convey linguistic and affective goals (Xu, 1999, 2011)
  • Listeners use prosodic cues to segment speech into units that supports working memory and comprehension.

Hypothesis on Explicit visual cues to prosody

  • Providing explicit visual cues to prosody would improve reading fluency in beginning readers.
    • Beginning readers struggle with prosody when reading aloud
      • Lack of sufficient cues in written texts
      • Developmentally challenging as it is simultaneous with reading acquisition

Visual cues to prosody

  • Previous studies/efforts: 

    • Intra-sentence spacing (e.g., Levasseur et al. 2006)
    • Punctuation manipulation and font case (Blevins, 2001)
    • Yet, prosodic variation in natural speech cannot be captured by simple mappings with such formatting
  • The ReadN'Karaoke software (Patel and McNab, 2011)

    • providing visual and auditory prosodic cues to pitch, duration, intensity

    • Visual: from manipulated texts to augmented texts

Manipulated vs. Augmented texts

Present study: effects of augmented cues

  • To assess ReadN'Karaoke 2.0 with beginning readers using a standardized story.
  • Hypotheses:
    • more prosodic change with visual cues
    • improved reading comprehension with visual cues

Method

Participants

  • Eight typically developing children
    • 3M, 5F
    • Mean = 8.06 years, SD = 0.36, range = 7;7-8;4
    • native speakers of AE
    • no reading, speech-language, or hearing problems
    • controlled reading level (2nd grade)

Materials

  • Baseline and post-training test: a five-chapter story

    • approximately 30 sentences per chapter

    • at least 3 occurrences of six contour types per chapter
    • only visual cues are provided
  • Training: some published children's book adapted to include targeted prosodic contrasts
    • visual + auditory model are provided
  • Augmented texts were generated semi-automatically from recordings of a fluent adult reader.
    • Text displayed in five formats: Standard, Pitch (F0 contour), Duration (brackets and spacing), Intensity (mass of vertical pulses), P+D+I

ReadN'Karaoke 2.0

Procedure

  • Baseline session (first week):
    • recording the 5-chapter story without prosodic cues
    • comprehension tests
  • Training sessions (2nd to 4th week):
    • Each cue was trained individually in 1st and 2nd sessions
    • combined cues in 3rd session
    • cue-mapping quiz prior to 2nd and 3rd sessions
  • Post-training session (5th week):
    • cue-mapping quiz
    • recording the 5-chapter story, with visual cues only
    • comprehension tests (same questions)

Acoustic analysis

  • Target words: words within and surrounding each prosodic contour
    • e.g., Declarative sentence: final word
    • e.g., Contrastive stress: word with stress
  • Cues measured: 
    • F0 peak
    • Intensity
    • Word Duration
    • Pause Duration
    • speech rate: words/second

Results

Word duration

  • Only including Contrastive Stress and Phrase-Final lengthening data
  • Main effects of testing time and cue condition
  • Significant testing time x cue condition interaction
    • post-training has longer duration, especially in Duration and Combination conditions

Pause duration

  • Only including Noun/Adj list and Phrase-Final lengthening data
  • No main effect of testing time
  • Significant testing time x cue condition interaction
    • post-training has longer pause duration in the durational cue condition

Peak pitch

  • Only including Contrastive Stress, Exclamatory Stress, and Phrase-Final Lengthening data
  • Significant main effect of testing time 
    • post-training has longer pause duration in the durational cue condition

Speech rate

  • Significant testing time x cue condition interaction
    • Faster post-training speech rate in the standard condition

No significant results for...

  • Mean intensity
  • Reading comprehension

Discussion and Conclusion

Summary of results

  • Post-training:
    • Longer word duration and higher F0 peak regardless of the cue condition
    • Longer pause duration only in the isolated pause cue condition
    • Faster speech rate only in the standard condition
    • No effects found for intensity, comprehension score

The effect of pitch cues

  • Generalizability after brief training period
    • overall post-training improvement
  • Pitch modulation is learned and used for various types of sentences 

The null effect of intensity cues

  • Developmental explanation:
    • Intensity control is a later developing skill (Stathopoulos and Sapienza, 1997)
  • Survey showed that participants understood the information conveyed with the intensity cue

Reading comprehension

  • The null results are likely caused by ceiling effect
    • The questions may be too easy for the participants
    • At least visual cues do not distract or make it harder for the participants 
    • (but it's the same questions...)

Speech rate

  • Increased speech rate in the standard (non-cue) condition
    • familiarity
    • trying to employ cues slowed down the speech rate

Usability survey

  • All participants liked the software and the materials! 
    • enjoyed "figuring the cues out"
    • liked "focusing on the words and saying them how you're supposed to"
    • learned to "focus on important words"
  • Participants' feedback for future use
    • the combination condition is confusing
    • not enough training on individual cue
  • Conclusion

  • Supplementing text with visual prosodic cues enhances oral reading expressivity in early readers.
  • Future work (for the method/software)
    • extended training paradigm for long term learning
    • gradually diminishing cues to standard texts only
    • L2, e.g., learning English prosody
Made with Slides.com