Problem:
- Laughter is non-lexical but communicative
- It provides cues regarding the emotional state of speakers
- It is a heterogeneous phenomenon and highly variable, even within a single speaker
Related Work:
- Identification and acoustic characterization of laughter
- Using SVMs, GMMs and HMMs
- The acoustic analysis: F0, MFCC, PLP
Automatic Laughter Recognition using DNN
Bingyan Hu
Method:
- Pre-processing + silence slicing
- Feature Extraction: MFCC(delta, delta-delta), Pitch and Energy, Prosodic Features
- Training Set: unsupervised pre-training + NN Supervised Training (kaldi::nnet1)
- Validation Set: determine hidden layers and neural units
- Test Set: NN Classification
Proposal
Data:
- ICSI Meeting Corpus (LDC2004S02)
- A transcribed corpus of multi-party meeting recordings. It contains human-made annotations of non-lexical vocalized sounds including laughter, heavy breath sounds, coughs etc
- 29 Meetings, 25h, 16 subjects, 6.2% laughter
deck
By Bingyan Hu
deck
- 529