It's alive!
Machine Learning writes your code
Dominic Elm
Uri Shaked
@elmd_
@UriShaked
How Everything
Started
@UriShaked
@elmd_
@UriShaked
ngVikings 2019
Angular Connect 2018
How to AI in JS? - Assim Hussain
Thank You Assim!
@elmd_
@UriShaked
ngVikings 2019
@elmd_
@UriShaked
ngVikings 2019
Given a function signature, can we create a model that will predict the body of that function?
RESEARCH QUESTION
@elmd_
@UriShaked
ngVikings 2019
Dominic Elm
@elmd_
@UriShaked
Who ARe we?
Software Engineer
Trainer & Consultant
@thoughtram
@stackblitz
Uri Shaked
Google Developer Expert
Community Organizer
Machine Learning 101
@elmd_
@UriShaked
ngVikings 2019
@elmd_
@UriShaked
ngVikings 2019
@elmd_
@UriShaked
email = 'How to be a Millionaire in 4 weeks'
if (email contains 'Millionaire')
markAsSpam(email)
else if (email contains '...')
...
else if (email contains '...')
...
data = [
('How to be a Millionaire in 4 weeks', SPAM),
('...', NO_SPAM),
('...', NO_SPAM),
('...', SPAM),
...
]
for example in data:
classify data
optimize
Traditional Program
ML Program
ngVikings 2019
@elmd_
@UriShaked
Neural Networks???
ngVikings 2019
@elmd_
@UriShaked
...
120
4
24.4
square meters
#bedrooms
0.2
0.1
120 x 0.2
4 x 0.1
+
ngVikings 2019
@elmd_
@UriShaked
...
120
4
24.4
square meters
#bedrooms
0.2
0.1
120 x 0.2
4 x 0.1
+
15
9.4
ERROR
ngVikings 2019
@elmd_
@UriShaked
...
120
4
12.2
square meters
#bedrooms
0.1
0.05
120 x 0.1
4 x 0.05
+
15
-2.8
ERROR
ngVikings 2019
@elmd_
@UriShaked
Input
Hidden
Output
ngVikings 2019
@elmd_
@UriShaked
HOW DO WE PREDICT FUNCTION BODIES?
ngVikings 2019
MODEL
@elmd_
@UriShaked
ngVikings 2019
function greet(name: string)
?
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
@elmd_
@UriShaked
ngVikings 2019
{
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
MODEL
function greet(name: string)
@elmd_
@UriShaked
ngVikings 2019
const
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
MODEL
function greet(name: string)
@elmd_
@UriShaked
ngVikings 2019
prefix
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
MODEL
function greet(name: string)
@elmd_
@UriShaked
ngVikings 2019
Gather Data
Clean Data
Choose Model
Training
Evaluation
1
2
3
4
5
ML Approach
@elmd_
@UriShaked
ngVikings 2019
@elmd_
@UriShaked
Gathering Data
ngVikings 2019
1
How can we quickly gather a lot of function examples?
Look at open source projects on GitHub
@elmd_
@UriShaked
Gathering Data
ngVikings 2019
1
We filtered only TypeScript files and extracted 324,280 TypeScript functions and collected them in a huge JSON file.
Using Google BigQuery we can run an SQL query to fetch all the code on GitHub in under a minute!
@elmd_
@UriShaked
ngVikings 2019
CLEANING Data
2
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
@elmd_
@UriShaked
ngVikings 2019
CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
function greet(name: string)
Split signature from body
{
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
@elmd_
@UriShaked
ngVikings 2019
CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet($arg0$: string)
Rename function parameters
{
const prefix = $arg0$.length < 10 ? 'Hi' : 'Hello';
return prefix + $arg0$;
}
@elmd_
@UriShaked
ngVikings 2019
CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet($arg0$: string)
Rename identifiers and literals
{
const id0 = $arg0$.id1 < 2 ? '3' : '4';
return id0 + $arg0$;
}
@elmd_
@UriShaked
ngVikings 2019
CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet ( $arg0$ : string )
Add spaces
{
const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
return id0 + $arg0$ ;
}
@elmd_
@UriShaked
ngVikings 2019
CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet ( $arg0$ : string )
Add START and END symbols
START {
const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
return id0 + $arg0$ ;
} END
@elmd_
@UriShaked
ngVikings 2019
CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
- Tokenize
- Text to Sequence
- Add padding
- Create inputs and outputs
@elmd_
@UriShaked
ngVikings 2019
CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Tokenization = Chopping the function body into pieces (tokens)
function greet ( $arg0$ : string )
START {
const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
return id0 + $arg0$ ;
} END
dict = {
'function': 1,
'greet': 2,
'(': 3,
'$arg0$': 4,
':': 5,
'string': 6,
')': 7,
'START': 8,
'{': 9,
...
}
@elmd_
@UriShaked
ngVikings 2019
CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Text to Sequence
function greet ( $arg0$ : string )
[1, 2, 3, 4, 5, 6, 7]
Add Padding
[0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7]
[1, 2, 3, 4, 5, 6, 7]
function isPrime ( $arg0$ : number )
[1, 13, 3, 4, 5, 23, 7]
@elmd_
@UriShaked
ngVikings 2019
CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Create Model Inputs and Outputs (X1, X2 and Y)
Inputs
Ouput
function greet ( $arg0$ : string )
START
{
function greet ( $arg0$ : string )
START {
const
function greet ( $arg0$ : string )
START { const
id0
Signature (X1)
Sequence (X2)
Next Token(Y)
@elmd_
@UriShaked
ngVikings 2019
CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Encode Output
{
Next Token(Y)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
One Hot Encoding
9
string
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
6
@elmd_
@UriShaked
ngVikings 2019
Choose MOdel
3
Look at Similar Problems
@elmd_
@UriShaked
ngVikings 2019
Choose MOdel
3
Machine Translation
@elmd_
@UriShaked
ngVikings 2019
Choose MOdel
3
Using Tensorflow
@elmd_
@UriShaked
ngVikings 2019
Choose MOdel
3
@elmd_
@UriShaked
ngVikings 2019
Training the Model
4
Google Colab
@elmd_
@UriShaked
ngVikings 2019
Training the Model
4
Google Cloud TPU (TensorFlow Processing Unit)
@elmd_
@UriShaked
ngVikings 2019
Evaluation
5
Evaluating the performance of the model
DEMO TIME
@elmd_
@UriShaked
ngVikings 2019
TakeAways
- Take advantage of the cloud
- Look for solutions to similar problems
- Data Processing makes a big chunk of the work
@elmd_
@UriShaked
ngVikings 2019
Thank You
Slides: https://go.urish.org/ml-vikings
@elmd_
@UriShaked
ngVikings 2019
ENJoy your lunch
🍱
@elmd_
@UriShaked
ngVikings 2019
Backlog
@elmd_
@UriShaked
ngVikings 2019
@elmd_
@UriShaked
AI
ML
Deep Learning
ngVikings 2019
@elmd_
@UriShaked
Artificial Intelligence
Machine Learning
Deep Learning
?
?
?
ngVikings 2019
@elmd_
@UriShaked
Artificial Intelligence
...is the science of making things smart.
ngVikings 2019
@elmd_
@UriShaked
Machine Learning
...an approach to achieve AI.
Learning from data and recognizing patterns, rather then being specifically programmed.
ngVikings 2019
@elmd_
@UriShaked
Deep Learning
...specific technique for implementing ML.
Typically we use Neural Networks to implement ML and achieve AI.
ngVikings 2019
@elmd_
@UriShaked
Rule-based Systems
ngVikings 2019
vs.
Learning from Data and recognizing patterns
@elmd_
@UriShaked
ngVikings 2019
Choose MOdel
3
Encoder
Decoder
Sequence to Sequence Model (Seq2Seq)
Sequence
Sequence
It's Alive! Machine Learning Writes Your Code
By Dominic E.
It's Alive! Machine Learning Writes Your Code
- 2,879