Voice Recognition in Python using Convolutional Neural Networks

Cosmin Catalin Sanda

rd

3   December 2019

ABOUT ME

Cosmin Catalin Sanda

Data Scientist and Engineer at AudienceProject

Blogging at https://cosminsanda.com

Github at https://github.com/cosmincatalin

What is Voice Recognition?

The ability to identify a speaker based on the sound of their voice

  • VERIFICATION: is the user who he claims he is.
  • IDENTIFICATION: recognize the user.

The problem at hand

  • Polly is a text-to-speech service from AWS
  • It features 8 voices for its English variant

Who of                         was used for a given recording ?

Salli

Kimberly

Kendra

Joanna

Ivy

Matthew

Justin

Joey

client = boto3.client("polly")

voices = ["Ivy", "Joanna", "Joey", "Justin",
    "Kendra", "Kimberly", "Matthew", "Salli"]

response = client.synthesize_speech(
    OutputFormat="mp3",
    Text="Polly wants a cracker",
    TextType="text",
    VoiceId=random.choice(voices)
)

with open("out.mp3", "wb") as out:
    with closing(response["AudioStream"]) as stream:
        out.write(stream.read())

Simplifying the problem

text

text-to-speech

sound

sound

image

image

Joanna

Joanna

Joanna

Kimberly

Kimberly

Kimberly

CONVOLUTIONAL NEURAL NETWORKS

34 13 54 45 45 34
34 34 34 54 43 34
34 56 34 54 45 23
34 43 34 44 45 56
34 54 45 46 34 6
34 54 56 65 56 56
20 13 54 45 45 34
34 34 34 54 43 34
34 56 34 54 45 23
34 43 34 44 45 56
34 54 45 46 34 6
34 54 56 65 56 56
20 13 54 45 45 34
34 34 34 54 43 34
34 56 34 54 45 23
34 43 34 44 45 56
34 54 45 46 34 6
34 54 56 65 56 56
34 13 54
34 34 34
34 56 34
34 13 54
34 34 34
34 56 34
34 13 54
34 34 34
34 56 34

Original

image

Numerical

representation

Filtered

output

simplification

Voice Recognition in Python using Convolutional Neural Networks

By Cosmin Cătălin Sanda

Voice Recognition in Python using Convolutional Neural Networks

Learn how to use a convolutional neural network built with MXNet for the purpose of speaker recognition.

  • 36