Machine Learning from a Developer's POV

Presenter: Simone Scardapane

Webinar, Italy Big Data & Machine Learning Meetup, 21 July 2017

Something about me

Post-doc fellow in Sapienza
Strong interest in ML for everyone, especially developers
Co-organizer of the Rome Machine Learning & Data Science Meetup
Program committee for Codemotion

The age of analytics: Competing in a data-driven world (McKinsey Report)

Software trend 1: simpler ML libraries

Can we predict the skill of a player?

import numpy as np
np.random.seed(256)

# Let us load some data!
import pandas as pd
data = pd.read_csv('./Data/SkillCraft1_Dataset.csv', na_values=('?'))
data.ix[0]

Load data

GameID                    52.000000
LeagueIndex                5.000000
Age                       27.000000
HoursPerWeek              10.000000
TotalHours              3000.000000
SelectByHotkeys            0.003515
AssignToHotkeys            0.000220
UniqueHotkeys              7.000000
MinimapAttacks             0.000110
MinimapRightClicks         0.000392
ActionLatency             40.867300
TotalMapExplored          28.000000
WorkersMade                0.001397
UniqueUnitsMade            6.000000
ComplexUnitsMade           0.000000
ComplexAbilitiesUsed       0.000000
Name: 0, dtype: float64

Thompson, J.J., Blair, M.R., Chen, L. and Henrey, A.J., 2013. Video game telemetry as a critical tool in the study of complex skill learning. PloS one, 8(9), p.e75129.

# We remove missing values from the dataset
# by replacing with most common values
from sklearn import preprocessing
data.ix[:,:] = preprocessing.Imputer().fit_transform(data.values)

# We train a random forest to classify 
# the predicted league of a player
from sklearn import ensemble
rf = ensemble.RandomForestClassifier()\
            .fit(data.values[1:, 2:], data.values[1:, 1])

Train a model!

print('Predicted league is:', rf.predict(data.values[0, 2:].reshape(1, -1)))

Predicted league is: [ 5.]

Auto machine learning

import autosklearn.classification

automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(data.values[1:, 2:], data.values[1:, 1])
y_hat = automl.predict(data.values[0,2:])

An overview of the AutoML system taken from: Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M. and Hutter, F., 2015. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems (pp. 2962-2970).

Software trend 2: feasible deep learning

# Create a simple Keras model
model = Sequential()
model.add(Conv2D(6, (3, 3), input_shape=(1, 50, 50), activation='relu'))
model.add(Conv2D(3, (3, 3), strides=(2,2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid', W_regularizer=l2(0.1)))

# Compile the model
sgd = SGD(lr=0.01, momentum=0.8, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model.summary())

Building models in Keras

Software trend 3: MODULAR DL

http://www.kdnuggets.com/2017/01/generative-adversarial-networks-hot-topic-machine-learning.html

Learning to Discover Cross-Domain Relations with Generative Adversarial Networks, ICML 2017

Even more trends!

Machine learning on mobile:

https://developer.apple.com/documentation/coreml

Machine learning as a service:

https://cloud.google.com/ml-engine/

Reinforcement learning:

Universe (OpenAI)

Words of caution...

Breaking linear classifiers on ConvNet

McDaniel, P., Papernot, N. and Celik, Z.B., 2016. Machine learning in adversarial settings. IEEE Security & Privacy, 14(3), pp. 68-72.

ML offers a fantastically powerful toolkit for building useful complex prediction systems quickly. ... it is dangerous to think of these quick wins as coming for free. ... it is common to incur massive ongoing maintenance costs in real-world ML systems. [Risk factors include] boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.

Hidden Technical Debt in Machine Learning Systems (NIPS 2015)