Natural Language Processing
Sahand Khaksar
Mahdi Davoodi
Siri, Cortana, Google Nav, Alexa ,and ...
And How the Hell Do they Do That?
Fons and Waves
Hidden Markov Modeling
determining who is speaking
determining what is being said
Verifies the identity of a person
Recognizing when the same speaker is speaking
Simplifies the task of translating speech in systems
1:N
1:1
the speaker's voice is recorded and typically a number of features are extracted to form a voice print, template, or model
a speech sample or "utterance" is compared against a previously created voice print.
Each speaker recognition system has two phases:
Enrollment and verification
speaker identification
Speaker recognition systems fall into two categories:
Statistical models that output a sequence of symbols or quantities
measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation
The neural network itself is not an algorithm, but rather a framework for many different machine learning algorithms to work together and process complex data inputs
End-to-end models jointly learn all the components of the speech recognizer. It simplifies the training process and deployment process.
For example, a n-gram language model is required for all HMM-based systems, and a typical n-gram language model often takes several gigabytes in memory making them impractical to deploy on mobile devices.
Founded in 2016, Silicon Valley startup AISense has raised $13 million in funding to develop their “Otter Voice Notes” app, a solution for transcribing long conversations between multiple people. Otter separates and identifies speakers, and allows users to store, search, analyze and share voice conversations. AISense provides the service through a cloud platform that includes storage as well, running their algorithms using Nvidia graphical processors.
Otter is available for consumers through the App Store and Google Play with a free plan that contains up to 600 minutes of transcription a month, or ten times that for $10 a month. Enterprise use cases include call centers, online meetings, and pre-production media content – all priced on a case-by-case basis.
Founded in 2016, Los Angeles startup Behavioral Signals has raised $1.5 million to develop a conversation analytics suite complete with automated transcription and behavioral analytics. Their “callER Analytics Engine” transcribes and analyzes calls while looking at the speakers’ emotional state to come up with a final success score.
Measuring factors like tone, positivity, politeness, or arousal, the engine is well equipped to help sales teams increase revenue by as much as 10% and even reduce agent attrition, the company claims
Founded in 2017, Netherlands startup SpeakSee has raised an undisclosed amount of funding to develop a small handheld microphone for real-time transcriptions for people with hearing problems. The company is currently running an Indiegogo campaign which has already exceeded the $50,000 target by 63%. These handheld microphones connect to a smartphone using Wi-Fi and listen in the direction they are pointed at, so background noise is effectively cancelled out.
Data is relayed to their base station, then transmitted to the SpeakSee app. Mics are compatible with conference call systems and televisions as well, and the platform supports more than 120 languages or dialects. (Really?) One mic+dock combo costs $250 and a dock with three mics costs $350 at current early bird rates on Indiegogo.
Persian TTS
شرکت دانش بنیان عصر گویش پرداز محصولات و خدمات نرمافزاری متنوعی در زمینههای هوش مصنوعی و پردازش سیگنال گفتار ارائه کرده است. یک تیم پژوهشی باتجربه از متخصصان دانشگاه صنعتی شریف در این شرکت فعالیت میکنند. این شرکت مشاوره فروش، نصب و پشتیبانی محصولات خود را توسط افراد مجرب انجام میدهد. عصر گویش پرداز پیشرو در توسعه فناوریهای مبتنی بر گفتار برای زبان فارسی است. از جمله این دستاوردها میتوان به تکنولوژیهای گفتار به نوشتار، متن به گفتار، جستجو در صوت و تشخیص هویت از روی صدا اشاره کرد.
نرم افزار تایپ صوتی نویسا