Machine Learning from a Developer's POV

Presenter: Simone Scardapane

Something about me

  • Post-doc fellow in Sapienza

  • Strong interest in ML for everyone, especially developers

  • Co-organizer of the Rome Machine Learning & Data Science Meetup

  • Program committee for Codemotion

Software trend 1: simpler ML libraries

Can we predict the skill of a player?

import numpy as np

# Let us load some data!
import pandas as pd
data = pd.read_csv('./Data/SkillCraft1_Dataset.csv', na_values=('?'))

Load data

GameID                    52.000000
LeagueIndex                5.000000
Age                       27.000000
HoursPerWeek              10.000000
TotalHours              3000.000000
SelectByHotkeys            0.003515
AssignToHotkeys            0.000220
UniqueHotkeys              7.000000
MinimapAttacks             0.000110
MinimapRightClicks         0.000392
ActionLatency             40.867300
TotalMapExplored          28.000000
WorkersMade                0.001397
UniqueUnitsMade            6.000000
ComplexUnitsMade           0.000000
ComplexAbilitiesUsed       0.000000
Name: 0, dtype: float64

Thompson, J.J., Blair, M.R., Chen, L. and Henrey, A.J., 2013. Video game telemetry as a critical tool in the study of complex skill learning. PloS one, 8(9), p.e75129.

# We remove missing values from the dataset
# by replacing with most common values
from sklearn import preprocessing
data.ix[:,:] = preprocessing.Imputer().fit_transform(data.values)

# We train a random forest to classify 
# the predicted league of a player
from sklearn import ensemble
rf = ensemble.RandomForestClassifier()\
            .fit(data.values[1:, 2:], data.values[1:, 1])

Train a model!

print('Predicted league is:', rf.predict(data.values[0, 2:].reshape(1, -1)))
Predicted league is: [ 5.]

Auto machine learning

import autosklearn.classification

automl = autosklearn.classification.AutoSklearnClassifier()[1:, 2:], data.values[1:, 1])
y_hat = automl.predict(data.values[0,2:])

An overview of the AutoML system taken from: Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M. and Hutter, F., 2015. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems (pp. 2962-2970).

Software trend 2: feasible deep learning

# Create a simple Keras model
model = Sequential()
model.add(Conv2D(6, (3, 3), input_shape=(1, 50, 50), activation='relu'))
model.add(Conv2D(3, (3, 3), strides=(2,2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(1, activation='sigmoid', W_regularizer=l2(0.1)))

# Compile the model
sgd = SGD(lr=0.01, momentum=0.8, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

Building models in Keras

Software trend 3: MODULAR DL

Even more trends!

Machine learning on mobile:


Machine learning as a service:


Reinforcement learning:

Universe (OpenAI)

Words of caution...

McDaniel, P., Papernot, N. and Celik, Z.B., 2016. Machine learning in adversarial settings. IEEE Security & Privacy, 14(3), pp. 68-72.

ML offers a fantastically powerful toolkit for building useful complex prediction systems quickly. ... it is dangerous to think of these quick wins as coming for free. ... it is common to incur massive ongoing maintenance costs in real-world ML systems. [Risk factors include] boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.

Will ML replace programmers?

Thanks for listening!

Machine Learning from a Developer's POV

By Simone Scardapane