Voice Recognition in Python using Convolutional Neural Networks

Cosmin Catalin Sanda

3 December 2019

ABOUT ME

Cosmin Catalin Sanda

Data Scientist and Engineer at AudienceProject

Blogging at https://cosminsanda.com

Github at https://github.com/cosmincatalin

What is Voice Recognition?

The ability to identify a speaker based on the sound of their voice

VERIFICATION: is the user who he claims he is.
IDENTIFICATION: recognize the user.

The problem at hand

Polly is a text-to-speech service from AWS
It features 8 voices for its English variant

Who of was used for a given recording ?

Salli

Kimberly

Kendra

Joanna

Ivy

Matthew

Justin

Joey

client = boto3.client("polly")

voices = ["Ivy", "Joanna", "Joey", "Justin",
    "Kendra", "Kimberly", "Matthew", "Salli"]

response = client.synthesize_speech(
    OutputFormat="mp3",
    Text="Polly wants a cracker",
    TextType="text",
    VoiceId=random.choice(voices)
)

with open("out.mp3", "wb") as out:
    with closing(response["AudioStream"]) as stream:
        out.write(stream.read())

Simplifying the problem

text

text-to-speech

sound

image

Joanna

Kimberly

CONVOLUTIONAL NEURAL NETWORKS

34	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

34	13	54
34	34	34
34	56	34

34	13	54
34	34	34
34	56	34

34	13	54
34	34	34
34	56	34

Original

image

Numerical

representation

Filtered

output

simplification

Voice Recognition in Python using Convolutional Neural Networks

By Cosmin Cătălin Sanda

Voice Recognition in Python using Convolutional Neural Networks

Learn how to use a convolutional neural network built with MXNet for the purpose of speaker recognition.

Cosmin Cătălin Sanda

Data Scientist & Engineer

cosminsanda.com

34	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

34	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

Voice Recognition in Python using Convolutional Neural Networks

ABOUT ME

What is Voice Recognition?

The problem at hand

Simplifying the problem

CONVOLUTIONAL NEURAL NETWORKS

Voice Recognition in Python using Convolutional Neural Networks

More from Cosmin Cătălin Sanda

34	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56