How I helped my daughter read with machine learning

📖 👨‍👧

Vincent Ogloblinsky - @vogloblinsky

Vincent Ogloblinsky

Compodoc maintainer

@vogloblinsky

Google Developer Expert on Web Technologies

Software architect / Open-Source referent

Orange Innovation / Data & AI / ARoD

Disclaimer

This talk is just a "technical" overview of machine learning from a developer's perspective.

I don't have a data-scientist training. 😉

Some topics (eg model optimization) are not yet covered.

Agenda

1. Genesis of the idea

2. Learning to read

3. Machine learning

4. Speech to text

5. The construction of the model

6. Results and outlook

Genesis of the idea

Like any geek dad who does the evening reading :

@vogloblinsky

- guide her daughter by dissecting the syllables of words with finger

- guide and correct oral deciphering

- work in a professional context "dealing with the voice" (Orange - Data IA)

- imagine that an application based on an adapted "speech to text" engine + a good dose of interactivity

- do a "sectoral" analysis and realize that it does not exist

Perfect! New technical challenge in the pocket for the geek dad 😀

Genesis of the idea

Reading help

Web application

🗣️

Child's voice

Machine learning

Speech to text

Definition of "ready"

Let's enforce some technical constraints

@vogloblinsky

100% "web" technologies

- JavaScript

- WebGL and/or WebAssembly

Offline & privacy by design

- no API calls possible

- no identification of the child

Learning to read

7 step process

https://www.bloghoptoys.fr/pas-a-pas-8-etapes-pour-apprendre-a-lire

1 - Awareness of spoken sounds

2 - Awareness of the link between oral and written

3 - The discovery of the alphabet composed of 26 letters

4 - Understand the association “sounds and letters”

5 - Understanding Syllabic Fusion

6 - Recognize words

7 - Understand texts

Syllabic Fusion

@vogloblinsky

"château"

ch ça fait "chhh"

Synthesis mental skill: bringing together the speech of a consonant and the speech of a vowel

/p/ et /a/ → pa

The child needs to know that language is segmented into words and also into smaller sound segments: phonemes and syllables (phoneme fusion)

a ça fait "aa"

t ça fait "ttt"

eau ça fait "ooo"

Wealth of "French" language

@vogloblinsky

26 letters in the alphabet

36 phonemes

vowels : [a] (table, patte), [é] (éléphant, parler), [o] (bonnet, chaud), ...

semi-vowels : [J] (fille, rail), ...

consonants : [b] (billets, abbé), [g] (gâteau, aggraver), ...

190 graphemes

[o] : o, au, eau

[k] : c, qu (coque)

Machine learning

IA in Orange

@vogloblinsky

IA in Orange

@vogloblinsky

Machine learning

@vogloblinsky

Subcategory of "Artificial Intelligence"

Algorithms discovering "patterns" in datasets

4 steps :

- select and prepare data

- select the algorithm to apply

- training of the algorithm (= model)

- use (and improvement of the model)

Machine learning

@vogloblinsky

3 main types of machine learning

- supervised learning: labeled data - task driven (expensive)

- unsupervised learning: unlabeled data - data driven (autonomous search for patterns)

- reinforcement learning: the algorithm learns from its mistakes to achieve an objective

Speech to text

Speech to text

Also called "Automatic Speech Recognition (ASR)"

https://towardsdatascience.com/audio-deep-learning-made-simple-automatic-speech-recognition-asr-how-it-works-716cfce4c706

Speech to text

https://towardsdatascience.com/audio-deep-learning-made-simple-automatic-speech-recognition-asr-how-it-works-716cfce4c706

Speech to text in Orange

Service 1

Service 2

Service 3

Service 4

Service 5

Service 6

"Speech to text" and children voices

@vogloblinsky

Current voice assistants "trained" with "adult" datasets

Vocally richer children's voices speaking: high-pitched, thinner vocal canal, smaller vocal cords; in short they "grow up"

Spectrum richer

Few voice datasets

Model building

2 possible approaches: "from scratch" or by "transfer learning"

- from scratch :

Advantage :

- full model control

Drawback :

- requires a lot of data

- transfer learning :

Advantage :

- benefits from initial training of the model

Drawback :

- less mastery of the model

Model building

2 possible approaches: "from scratch" or by "transfer learning"

https://datascientest.com/transfer-learning

Transfer learning

https://datascientest.com/transfer-learning

Transfer learning

https://datascientest.com/transfer-learning

Sound classification

https://towardsdatascience.com/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5

Simpler use case than an ASR

Model building

@vogloblinsky

🔘 Speech commands dataset (www.tensorflow.org/datasets/catalog/speech_commands)

- proposed by Google in 2017

- 65000 sounds of 1s of 30 short words spoken by thousands of people

🔘 Using Tensorflow as a Machine Learning Framework

🔘 "Training" locally (Python) then "export"

Tensorflow

@vogloblinsky

Developped by Google Brain

Released in 2017 in v1.0.0 - (current 2.8.0)

Tensorflow

@vogloblinsky

Tensorflow.js

@vogloblinsky

Uses "under the hood" GPU and WebGL APIs

Data gathering

@vogloblinsky

Data gathering web interface

- simple syllable set (20)

- receiving wav files

- no information collected on the child (age, gender)

Data preparation

@vogloblinsky

Data cleaning web interface

- one sound per syllable per child

- shortening to 1s

- cleaning of parasitic sounds (uh, ...)

+ increase (pitch variation)

Model training

@vogloblinsky

1. Separation of training data

80% for training

10% for Tensorflow internal validation

10% for testing

3. Loading the base model

2. Inspection of some spectrograms

Model training

@vogloblinsky

4. Freezing all layers of the model except the last one

for layer in model.layers[:-1]:
  layer.trainable = False

model.compile(optimizer="sgd", loss="sparse_categorical_crossentropy", metrics=["acc"])

Model training

@vogloblinsky

Print layers information

Model training

@vogloblinsky

5. Training : ~ 5min

Model training

6. Loss function control

Difference between the predictions made by the neural network and the actual values of the observations used during learning

Itération

Loss

Model training

7. Accuracy check

It measures the effectiveness of a model in correctly predicting both positive and negative individuals.

Itération

Accuracy

Model training

@vogloblinsky

8. Confusion Matrix Display

Model training

@vogloblinsky

8. Control with additional test files (labelled)

Model export

@vogloblinsky

# Convert the model to TensorFlow.js Layers model format.

tfjs_model_dir = "./thot-model-tfjs-1"
tfjs.converters.save_keras_model(model, tfjs_model_dir)

# Create the metadata.json file.
metadata = {
    "words": list(commands),
    "frameSize": model.input_shape[-2],
    "generated_at": now.strftime("%Y-%m-%d-%H:%M:%S")
}
with open(os.path.join(tfjs_model_dir, "metadata.json"), "w") as f:
    json.dump(metadata, f)

4.1 Mo

1.6 Mo

Model import in JavaScript

@vogloblinsky

@tensorflow-models/speech-commands : package JavaScript de pilotage du modèle

import * as tf from '@tensorflow/tfjs-core';
import * as tfl from '@tensorflow/tfjs-layers';
import * as speechCommands from '@tensorflow-models/speech-commands';

const recognizer = speechCommands.create(
    'BROWSER_FFT',
    null,
    'http://test.com/my-audio-model/model.json',
    'http://test.com/my-audio-model/metadata.json'
);

Use of the model in JavaScript

@vogloblinsky

Continuous listening

API getUserMedia

setInterval

~ 1s

Recovery of audio frequencies

Creation of the spectrogram

Send to Tensorflow model

Retrieving predictions

Results and outlook

Demo : syllabe

@vogloblinsky

Demo : word syllabe by syllabe

@vogloblinsky

Demo : word by word

@vogloblinsky

Outlook

@vogloblinsky

Model scaling with crowdsourcing

Adaptation layer on the application side: correction, guidance

Detection of phonological dyslexia

Gamification of the "child" course

Customization of the model to the voice of the child (on-device)

Personal feedbacks

@vogloblinsky

Super technical adventure

Exciting and growing ML domain (OpenAI, etc)

Test, fail & learn approach perfect for this side-project

Ressource

https://teachablemachine.withgoogle.com

Thank you for your attention !

Questions ?

Slides : bit.ly/3uBPDYR

@vogloblinsky

How I helped my daughter read with machine learning

📖 👨‍👧

Disclaimer

This talk is just a "technical" overview of machine learning from a developer's perspective.

I don't have a data-scientist training. 😉

Some topics (eg model optimization) are not yet covered.

Agenda

1.

Genesis of the idea

2.

Learning to read

3.

Machine learning

4.

Speech to text

5.

The construction of the model

6.

Results and outlook

Genesis of the idea

Genesis of the idea

Like any geek dad who does the evening reading :

Genesis of the idea

Definition of "ready"

Let's enforce some technical constraints

Learning to read

Learning to read

7 step process

Syllabic Fusion

Wealth of "French" language

Machine learning

IA in Orange

IA in Orange

Machine learning

Machine learning

Speech to text

Speech to text

Speech to text

Speech to text

Speech to text

Speech to text in Orange

Speech to text in Orange

"Speech to text" and children voices

Model building

Model building

Model building

Transfer learning

Transfer learning

Sound classification

Model building

Tensorflow

Tensorflow

Tensorflow.js

Data gathering

Data preparation

Model training

Model training

Model training

Model training

Model training

Model training

Model training

Model training

Model export

Model import in JavaScript

Use of the model in JavaScript

Results and outlook

Demo : syllabe

Demo : word syllabe by syllabe

Demo : word by word

Outlook

Personal feedbacks

Ressource

Thank you for your attention !

Feedback ? 👉🏻 Here

How I helped my daughter read with machine learning

More from Vincent Ogloblinsky