Abkhazia tutorial
forced alignment of speech corpora ...
... made easy
Mathieu Bernard - CoML
Forced alignment
Input
wav file
annotation file
lexicon file
s0102a-sent17 that's what i <SIL> <NOISE> recalls0102a-sent17.wav<SIL> SIL
<NOISE> NSN
that's dh ae t s
what w ah t
i ah
recall r iy k ao ls0102a-sent17 0.0000 0.3675 SIL
s0102a-sent17 0.3675 0.5675 dh that's
s0102a-sent17 0.5675 0.7675 ae
s0102a-sent17 0.7675 0.7975 t
s0102a-sent17 0.7975 1.9275 s
s0102a-sent17 1.9275 3.0275 SIL
s0102a-sent17 3.0275 3.0575 w what
s0102a-sent17 3.0575 3.0875 ah
s0102a-sent17 3.0875 3.1975 t
s0102a-sent17 3.1975 3.2275 ah i
s0102a-sent17 3.2275 3.3875 SIL <SIL>
s0102a-sent17 3.3875 3.5075 NSN <NOISE>
s0102a-sent17 3.5075 3.7175 r recall
s0102a-sent17 3.7175 3.8475 iy
s0102a-sent17 3.8475 4.2075 k
s0102a-sent17 4.2075 4.3575 ao
s0102a-sent17 4.3575 4.4075 lOutput
alignment file
alignment at phones and/or words level
For that tutorial
- You have
- raw speech files (e.g. wav)
- annotation on them (e.g. TextGrid)
- You want
- alignement of text on speech
- at word or phone level
Install abkhazia
- abkhazia is a python3 library
- guidelines: https://coml.lscp.ens.fr/abkhazia/install.html
- or use the docker image:
docker pull cognitiveml/abkhazia:version-1.0abkhazia tutorial
By mmmaat
abkhazia tutorial
- 8