Cosmin Catalin Sanda
rd
3 December 2019
Cosmin Catalin Sanda
Data Scientist and Engineer at AudienceProject
Blogging at https://cosminsanda.com
Github at https://github.com/cosmincatalin
The ability to identify a speaker based on the sound of their voice
Who of was used for a given recording ?
Salli
Kimberly
Kendra
Joanna
Ivy
Matthew
Justin
Joey
client = boto3.client("polly")
voices = ["Ivy", "Joanna", "Joey", "Justin",
"Kendra", "Kimberly", "Matthew", "Salli"]
response = client.synthesize_speech(
OutputFormat="mp3",
Text="Polly wants a cracker",
TextType="text",
VoiceId=random.choice(voices)
)
with open("out.mp3", "wb") as out:
with closing(response["AudioStream"]) as stream:
out.write(stream.read())
text
text-to-speech
sound
sound
image
image
Joanna
Joanna
Joanna
Kimberly
Kimberly
Kimberly
34 | 13 | 54 | 45 | 45 | 34 |
34 | 34 | 34 | 54 | 43 | 34 |
34 | 56 | 34 | 54 | 45 | 23 |
34 | 43 | 34 | 44 | 45 | 56 |
34 | 54 | 45 | 46 | 34 | 6 |
34 | 54 | 56 | 65 | 56 | 56 |
20 | 13 | 54 | 45 | 45 | 34 |
34 | 34 | 34 | 54 | 43 | 34 |
34 | 56 | 34 | 54 | 45 | 23 |
34 | 43 | 34 | 44 | 45 | 56 |
34 | 54 | 45 | 46 | 34 | 6 |
34 | 54 | 56 | 65 | 56 | 56 |
20 | 13 | 54 | 45 | 45 | 34 |
34 | 34 | 34 | 54 | 43 | 34 |
34 | 56 | 34 | 54 | 45 | 23 |
34 | 43 | 34 | 44 | 45 | 56 |
34 | 54 | 45 | 46 | 34 | 6 |
34 | 54 | 56 | 65 | 56 | 56 |
34 | 13 | 54 |
34 | 34 | 34 |
34 | 56 | 34 |
34 | 13 | 54 |
34 | 34 | 34 |
34 | 56 | 34 |
34 | 13 | 54 |
34 | 34 | 34 |
34 | 56 | 34 |
Original
image
Numerical
representation
Filtered
output
simplification