It's alive!
Machine Learning writes your code
Dominic Elm
Uri Shaked
@elmd_
@UriShaked
How Everything
Started
@UriShaked
Angular Connect 2018
How to AI in JS? - Assim Hussain
@UriShaked
Thank You Assim!
@UriShaked
@UriShaked
Given a function signature, can we create a model that will predict the body of that function?
RESEARCH QUESTION
@UriShaked
Machine Learning 101
@UriShaked
email = 'How to be a Millionaire in 4 weeks'
if (email contains 'Millionaire')
markAsSpam(email)
else if (email contains '...')
...
else if (email contains '...')
...
data = [
('How to be a Millionaire in 4 weeks', SPAM),
('...', NO_SPAM),
('...', NO_SPAM),
('...', SPAM),
...
]
for example in data:
classify data
optimize
Traditional Program
ML Program
@UriShaked
Neural Networks???
@UriShaked
...
120
4
24.4
square meters
#bedrooms
0.2
0.1
120 x 0.2
4 x 0.1
+
@UriShaked
...
120
4
24.4
square meters
#bedrooms
0.2
0.1
120 x 0.2
4 x 0.1
+
15
9.4
ERROR
@UriShaked
...
120
4
12.2
square meters
#bedrooms
0.1
0.05
120 x 0.1
4 x 0.05
+
15
-2.8
ERROR
@UriShaked
Input
Hidden
Output
@UriShaked
HOW DO WE PREDICT FUNCTION BODIES?
@UriShaked
MODEL
function greet(name: string)
?
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
@UriShaked
{
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
MODEL
function greet(name: string)
@UriShaked
const
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
MODEL
function greet(name: string)
@UriShaked
prefix
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
MODEL
function greet(name: string)
@UriShaked
Gather Data
Clean Data
Choose Model
Training
Evaluation
1
2
3
4
5
ML Approach
@UriShaked
@UriShaked
Gathering Data
1
How can we quickly gather a lot of function examples?
Look at open source projects on GitHub
@UriShaked
Gathering Data
1
We filtered only TypeScript files and extracted 324,280 TypeScript functions and collected them in a huge JSON file.
Using Google BigQuery we can run an SQL query to fetch all the code on GitHub in under a minute!
@UriShaked
CLEANING Data
2
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
@UriShaked
CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet(name: string)
Split signature from body
{
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}
@UriShaked
function greet($arg0$: string)
Rename function parameters
{
const prefix = $arg0$.length < 10 ? 'Hi' : 'Hello';
return prefix + $arg0$;
}
@UriShaked
CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet($arg0$: string)
Rename identifiers and literals
{
const id0 = $arg0$.id1 < 2 ? '3' : '4';
return id0 + $arg0$;
}
@UriShaked
CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet ( $arg0$ : string )
Space tokens
{
const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
return id0 + $arg0$ ;
}
@UriShaked
CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet ( $arg0$ : string )
Add START and END symbols
START {
const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
return id0 + $arg0$ ;
} END
@UriShaked
CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
Create Model Inputs and Outputs
Input
Ouput
function greet ( $arg0$ : string )
START
{
function greet ( $arg0$ : string )
START {
const
function greet ( $arg0$ : string )
START { const
id0
@UriShaked
CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Building a dictionary with all tokens
function greet ( $arg0$ : string )
START {
const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
return id0 + $arg0$ ;
} END
dict = {
'function': 1,
'greet': 2,
'(': 3,
'$arg0$': 4,
':': 5,
'string': 6,
')': 7,
'START': 8,
'{': 9,
...
}
@UriShaked
CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Text to Sequence
function greet ( $arg0$ : string )
[1, 2, 3, 4, 5, 6, 7]
Add Padding
[0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7]
[1, 2, 3, 4, 5, 6, 7]
function isPrime ( $arg0$ : number )
[1, 13, 3, 4, 5, 23, 7]
@UriShaked
CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Encode Output
{
Next Token(Y)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
One Hot Encoding
9
string
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
6
@UriShaked
CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Choose MOdel
3
Look at Similar Problems
@UriShaked
Machine Translation
Choose MOdel
3
@UriShaked
Using Tensorflow
Choose MOdel
3
@UriShaked
Choose MOdel
3
@UriShaked
Training the Model
4
Google Colab
@UriShaked
Training the Model
4
Google Cloud TPU (TensorFlow Processing Unit)
@UriShaked
Evaluation
5
Evaluating the performance of the model
DEMO TIME
@UriShaked
TakeAways
- Take advantage of the cloudÂ
- Look for solutions to similar problems
- Data Processing makes a big chunk of the work
@UriShaked
TakeAways
@UriShaked
https://urish.org
leaRn more
It's Alive! Machine Learning Writes Your Code (AngularUP)
By Uri Shaked
It's Alive! Machine Learning Writes Your Code (AngularUP)
- 596