Abkhazia tutorial

forced alignment of speech corpora ...

... made easy

 

 

 

Mathieu Bernard - CoML

 

https://github.com/bootphon/abkhazia

Forced alignment

Input

     wav file

 

     annotation file

 

    lexicon file

 

s0102a-sent17 that's what i <SIL> <NOISE> recall
s0102a-sent17.wav
<SIL> SIL
<NOISE> NSN
that's dh ae t s
what w ah t
i ah
recall r iy k ao l
s0102a-sent17 0.0000 0.3675 SIL
s0102a-sent17 0.3675 0.5675 dh that's
s0102a-sent17 0.5675 0.7675 ae
s0102a-sent17 0.7675 0.7975 t
s0102a-sent17 0.7975 1.9275 s
s0102a-sent17 1.9275 3.0275 SIL
s0102a-sent17 3.0275 3.0575 w what
s0102a-sent17 3.0575 3.0875 ah
s0102a-sent17 3.0875 3.1975 t
s0102a-sent17 3.1975 3.2275 ah i
s0102a-sent17 3.2275 3.3875 SIL <SIL>
s0102a-sent17 3.3875 3.5075 NSN <NOISE>
s0102a-sent17 3.5075 3.7175 r recall
s0102a-sent17 3.7175 3.8475 iy
s0102a-sent17 3.8475 4.2075 k
s0102a-sent17 4.2075 4.3575 ao
s0102a-sent17 4.3575 4.4075 l

Output

alignment file

alignment at phones and/or words level

For that tutorial

  • You have
    • raw speech files             (e.g. wav)
    • annotation on them      (e.g. TextGrid)
  • You want
    • alignement of text on speech
    • at word or phone level

Install abkhazia

  • abkhazia is a python3 library
  • guidelines: https://coml.lscp.ens.fr/abkhazia/install.html
  • or use the docker image:
docker pull cognitiveml/abkhazia:version-1.0

abkhazia tutorial

By mmmaat

abkhazia tutorial

  • 8