Machine Learning from a Developer's POV
Presenter: Simone Scardapane
Webinar, Italy Big Data & Machine Learning Meetup, 21 July 2017
Something about me
- Post-doc fellow in Sapienza
- Strong interest in ML for everyone, especially developers
- Co-organizer of the Rome Machine Learning & Data Science Meetup
- Program committee for Codemotion
Software trend 1: simpler ML libraries
Can we predict the skill of a player?
import numpy as np
np.random.seed(256)
# Let us load some data!
import pandas as pd
data = pd.read_csv('./Data/SkillCraft1_Dataset.csv', na_values=('?'))
data.ix[0]
Load data
GameID 52.000000
LeagueIndex 5.000000
Age 27.000000
HoursPerWeek 10.000000
TotalHours 3000.000000
SelectByHotkeys 0.003515
AssignToHotkeys 0.000220
UniqueHotkeys 7.000000
MinimapAttacks 0.000110
MinimapRightClicks 0.000392
ActionLatency 40.867300
TotalMapExplored 28.000000
WorkersMade 0.001397
UniqueUnitsMade 6.000000
ComplexUnitsMade 0.000000
ComplexAbilitiesUsed 0.000000
Name: 0, dtype: float64
Thompson, J.J., Blair, M.R., Chen, L. and Henrey, A.J., 2013. Video game telemetry as a critical tool in the study of complex skill learning. PloS one, 8(9), p.e75129.
# We remove missing values from the dataset
# by replacing with most common values
from sklearn import preprocessing
data.ix[:,:] = preprocessing.Imputer().fit_transform(data.values)
# We train a random forest to classify
# the predicted league of a player
from sklearn import ensemble
rf = ensemble.RandomForestClassifier()\
.fit(data.values[1:, 2:], data.values[1:, 1])
Train a model!
print('Predicted league is:', rf.predict(data.values[0, 2:].reshape(1, -1)))
Predicted league is: [ 5.]
Auto machine learning
import autosklearn.classification
automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(data.values[1:, 2:], data.values[1:, 1])
y_hat = automl.predict(data.values[0,2:])
An overview of the AutoML system taken from: Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M. and Hutter, F., 2015. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems (pp. 2962-2970).
Software trend 2: feasible deep learning
# Create a simple Keras model
model = Sequential()
model.add(Conv2D(6, (3, 3), input_shape=(1, 50, 50), activation='relu'))
model.add(Conv2D(3, (3, 3), strides=(2,2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid', W_regularizer=l2(0.1)))
# Compile the model
sgd = SGD(lr=0.01, momentum=0.8, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model.summary())
Building models in Keras
Software trend 3: MODULAR DL
Even more trends!
Machine learning on mobile:
https://developer.apple.com/documentation/coreml
Machine learning as a service:
https://cloud.google.com/ml-engine/
Reinforcement learning:
Words of caution...
McDaniel, P., Papernot, N. and Celik, Z.B., 2016. Machine learning in adversarial settings. IEEE Security & Privacy, 14(3), pp. 68-72.
ML offers a fantastically powerful toolkit for building useful complex prediction systems quickly. ... it is dangerous to think of these quick wins as coming for free. ... it is common to incur massive ongoing maintenance costs in real-world ML systems. [Risk factors include] boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.
Machine Bias [Pro Publica]
There is a blind spot in AI research [Nature]
Will ML replace programmers?
DeepCoder: Learning to Write Programs [arXiv preprint]
DeepCoder: Learning to Write Programs [arXiv preprint]
Thanks for listening!
Machine Learning from a Developer's POV
By Simone Scardapane
Machine Learning from a Developer's POV
Slides for the following webinar: https://www.bigmarker.com/italy-big-data-machine-learn/Machine-Learning-from-a-Developer-s-POV-Airbnb-s-Airflow
- 2,162