It's alive!
Machine Learning writes your code


Dominic Elm
Uri Shaked
@elmd_
@UriShaked

How Everything
Started
@UriShaked


Angular Connect 2018
How to AI in JS? - Assim Hussain
@UriShaked


Thank You Assim!

@UriShaked



@UriShaked


Given a function signature, can we create a model that will predict the body of that function?
RESEARCH QUESTION
@UriShaked


Machine Learning 101
@UriShaked







email = 'How to be a Millionaire in 4 weeks'
if (email contains 'Millionaire')
markAsSpam(email)
else if (email contains '...')
...
else if (email contains '...')
...data = [
('How to be a Millionaire in 4 weeks', SPAM),
('...', NO_SPAM),
('...', NO_SPAM),
('...', SPAM),
...
]
for example in data:
classify data
optimizeTraditional Program
ML Program
@UriShaked


Neural Networks???
@UriShaked


...
120
4
24.4
square meters
#bedrooms
0.2
0.1
120 x 0.2
4 x 0.1
+
@UriShaked


...
120
4
24.4
square meters
#bedrooms
0.2
0.1
120 x 0.2
4 x 0.1
+
15
9.4
ERROR
@UriShaked


...
120
4
12.2
square meters
#bedrooms
0.1
0.05
120 x 0.1
4 x 0.05
+
15
-2.8
ERROR
@UriShaked


Input
Hidden
Output
@UriShaked


HOW DO WE PREDICT FUNCTION BODIES?
@UriShaked


MODEL
function greet(name: string)?
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}@UriShaked


{function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}MODEL
function greet(name: string)@UriShaked


constfunction greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}MODEL
function greet(name: string)@UriShaked


prefixfunction greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}MODEL
function greet(name: string)@UriShaked


Gather Data
Clean Data
Choose Model
Training
Evaluation
1
2
3
4
5
ML Approach
@UriShaked



@UriShaked


Gathering Data
1
How can we quickly gather a lot of function examples?
Look at open source projects on GitHub

@UriShaked


Gathering Data
1
We filtered only TypeScript files and extracted 324,280 TypeScript functions and collected them in a huge JSON file.
Using Google BigQuery we can run an SQL query to fetch all the code on GitHub in under a minute!


@UriShaked


CLEANING Data
2
function greet(name: string) {
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}@UriShaked


CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet(name: string)Split signature from body
{
const prefix = name.length < 10 ? 'Hi' : 'Hello';
return prefix + name;
}@UriShaked


function greet($arg0$: string)Rename function parameters
{
const prefix = $arg0$.length < 10 ? 'Hi' : 'Hello';
return prefix + $arg0$;
}@UriShaked


CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet($arg0$: string)Rename identifiers and literals
{
const id0 = $arg0$.id1 < 2 ? '3' : '4';
return id0 + $arg0$;
}@UriShaked


CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet ( $arg0$ : string )Space tokens
{
const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
return id0 + $arg0$ ;
}@UriShaked


CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
function greet ( $arg0$ : string )Add START and END symbols
START {
const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
return id0 + $arg0$ ;
} END@UriShaked


CLEANING Data
2
2
Prepare model inputs
1
Preprocess raw dataset
Create Model Inputs and Outputs
Input
Ouput
function greet ( $arg0$ : string )START{function greet ( $arg0$ : string )START {constfunction greet ( $arg0$ : string )START { constid0@UriShaked


CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Building a dictionary with all tokens
function greet ( $arg0$ : string )START {
const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
return id0 + $arg0$ ;
} ENDdict = {
'function': 1,
'greet': 2,
'(': 3,
'$arg0$': 4,
':': 5,
'string': 6,
')': 7,
'START': 8,
'{': 9,
...
}@UriShaked


CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Text to Sequence
function greet ( $arg0$ : string )[1, 2, 3, 4, 5, 6, 7]Add Padding
[0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7][1, 2, 3, 4, 5, 6, 7]function isPrime ( $arg0$ : number )[1, 13, 3, 4, 5, 23, 7]@UriShaked


CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Encode Output
{Next Token(Y)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]One Hot Encoding
9string[0, 0, 0, 0, 0, 0, 1, 0, 0, 0]6@UriShaked


CLEANING DATA
2
2
Prepare model inputs
1
Preprocess raw dataset
Choose MOdel
3
Look at Similar Problems
@UriShaked


Machine Translation
Choose MOdel
3
@UriShaked


Using Tensorflow
Choose MOdel
3
@UriShaked





Choose MOdel
3
@UriShaked


Training the Model
4
Google Colab

@UriShaked


Training the Model
4

Google Cloud TPU (TensorFlow Processing Unit)
@UriShaked


Evaluation
5
Evaluating the performance of the model
DEMO TIME

@UriShaked


TakeAways
- Take advantage of the cloudÂ
- Look for solutions to similar problems
- Data Processing makes a big chunk of the work
@UriShaked


TakeAways

@UriShaked


https://urish.org
leaRn more
It's Alive! Machine Learning Writes Your Code (AngularUP)
By Uri Shaked
It's Alive! Machine Learning Writes Your Code (AngularUP)
- 651
