Voice Recognition in Python using Convolutional Neural Networks

Cosmin Catalin Sanda

3 December 2019

ABOUT ME

Cosmin Catalin Sanda

Data Scientist and Engineer at AudienceProject

Blogging at https://cosminsanda.com

Github at https://github.com/cosmincatalin

What is Voice Recognition?

The ability to identify a speaker based on the sound of their voice

VERIFICATION: is the user who he claims he is.
IDENTIFICATION: recognize the user.

The problem at hand

Polly is a text-to-speech service from AWS
It features 8 voices for its English variant

Who of was used for a given recording ?

Salli

Kimberly

Kendra

Joanna

Ivy

Matthew

Justin

Joey

client = boto3.client("polly")

voices = ["Ivy", "Joanna", "Joey", "Justin",
    "Kendra", "Kimberly", "Matthew", "Salli"]

response = client.synthesize_speech(
    OutputFormat="mp3",
    Text="Polly wants a cracker",
    TextType="text",
    VoiceId=random.choice(voices)
)

with open("out.mp3", "wb") as out:
    with closing(response["AudioStream"]) as stream:
        out.write(stream.read())

Simplifying the problem

text

text-to-speech

sound

image

Joanna

Kimberly

CONVOLUTIONAL NEURAL NETWORKS

34	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

34	13	54
34	34	34
34	56	34

34	13	54
34	34	34
34	56	34

34	13	54
34	34	34
34	56	34

Original

image

Numerical

representation

Filtered

output

simplification

34	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

34	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

34	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56

20	13	54	45	45	34
34	34	34	54	43	34
34	56	34	54	45	23
34	43	34	44	45	56
34	54	45	46	34	6
34	54	56	65	56	56