Technologie mowy

State-of-the-art rozpoznawania mowy i biometrii głosowej Aplikacja podobnych technik w detekcji ataków typu playback

mgr inż. Piotr Żelasko
Opiekun naukowy: prof. dr hab. inż. Mariusz Ziółko

Seminarium doktoranckie WIEiT AGH - 1.02.2017, Kraków

Cele

Demonstracja technik state-of-the-art w rozpoznawaniu mowy oraz rozpoznawaniu mówcy
Pokazanie, że głębokie sieci neuronowe pozwoliły na przełom w tych dziedzinach
Potencjalna aplikacja w detekcji ataków typu playback na systemy biometrii głosowej

Automatyczne rozpoznawanie mowy

Model akustyczny

(tradycyjny)

Głębokie sieci neuronowe

Model akustyczny

(state of the art)

Amodei et al., Deep Speech 2: End-to-End Speech Recognition in English and Mandarin, Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016

http://itl.nist.gov/iad/mig/publications/ASRhistory/index.html

Benchmark 2016

https://github.com/syhw/wer_are_we

Rozpoznawanie mówcy

Sadjadi et al., The IBM 2016 Speaker Recognition System, Odyssey 2016, Bilbao, Spain

Sadjadi et al., MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research, IEEE SLTC Newsletter, November 2013

Benchmark

Sadjadi et al., The IBM 2016 Speaker Recognition System, Odyssey 2016, Bilbao, Spain

Detekcja playbacku

The 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof)

Dziękuję za uwagę

Technologie mowy

By Piotr Żelasko

Technologie mowy

Piotr Żelasko

Research scientist at CLSP, John's Hopkins University. PhD @ AGH-UST in Cracow. My interests are automatic speech recognition, natural language processing, C++ and Python, machine learning and deep learning, and jazz music.

Technologie mowy

Cele

Automatyczne rozpoznawanie mowy

Model akustyczny

Głębokie sieci neuronowe

Model akustyczny

Benchmark 2016

Rozpoznawanie mówcy

Benchmark

Detekcja playbacku

The 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof)

Dziękuję za uwagę

Technologie mowy

More from Piotr Żelasko