PV226 ML: AutoML

Content of this session

ML frameworks - comparison and experience

AutoML

How to evaluate ML model

Before we start:

How does the model look?

This must be represented in some way

First layer is an input and last an output

Result is an application. Containerised application.

Later challenges: scaling

ML Frameworks

Are used to create such model.

Tensorflow

  • Low-level lib (with some high level interfaces)
  • Good for general ML task
  • Works on CPU and GPU, and all major OS
  • Developed by Google

Keras

  • High level abstraction on top of TensorFlow
  • Focused on neural networks
  • Huge ecosystem

PyTorch

  • Competition to TensorFlow
  • Easier debugging and more customisation
  • Huge ecosystem but not as wide as TF

There are Julia frameworks: knet and flux

And AutoML tools from:

Amazon: Sage Maker

Amazon: Sage Maker

  • price-wise ok
  • does not provide so good results as Google or Microsoft
  • Rapidly growing
  • Good pretrained models for ecommerce

Google: AutoML Tables

  • only NN or decision trees
  • best results from top 3 cloud providers
  • most expensive (= too expensive for my taste)

Microsoft: ML Studio

  • many different algorithms
  • not so stable results but good enough
  • cheap

AutoML will do all the work for you

  • it will prepare data
  • try different algorithms
  • prepare container

An AutoML system based on Keras.

Installation

pip install autokeras

3.5 <= Python < 3.9 and TensorFlow >= 2.3.0

Supported Tasks

  • Image Classification
  • Image Regression
  • Text Classification
  • Text Regression
  • Structured Data Classification
  • Structured Data Regression

Working with Autokeras

from sklearn.datasets import fetch_california_housing
import numpy as np
import pandas as pd
import tensorflow as tf
import autokeras as ak

house_dataset = fetch_california_housing()
df = pd.DataFrame(
    np.concatenate((
        house_dataset.data, 
        house_dataset.target.reshape(-1,1)),
        axis=1),
    columns=house_dataset.feature_names + ['Price'])
train_size = int(df.shape[0] * 0.9)

df[:train_size].to_csv('train.csv', index=False)
df[train_size:].to_csv('eval.csv', index=False)

train_file_path = 'train.csv'
test_file_path = 'eval.csv'

prepare data

Working with Autokeras

# Initialize the structured data regressor.
reg = ak.StructuredDataRegressor(
    overwrite=True,
    max_trials=3) # It tries 3 different models.
# Feed the structured data regressor with training data.

reg.fit(
    # The path to the train.csv file.
    train_file_path,
    # The name of the label column.
    'Price',
    epochs=10)
    
# Predict with the best model.
predicted_y = reg.predict(test_file_path)

# Evaluate the best model with testing data.
print(reg.evaluate(test_file_path, 'Price'))

train and evaluate

Let's say we created model. How to evaluate it?

Classification

conf. matrix Data Data
positive negative
Model positive a b
Model negative c d

Accuracy = (a+d)/(a+b+c+d)

Sensitivity (Recall) = a/(a+c)  proportion of positive cases correctly identified

Specificity = d/(b+d) proportion of negative cases correctly identified

Confusion Matrix

F1 = 2*((precision*recall)/(precision+recall))

Matthews correlation coefficient

value from -1 to 1

0 equals random walk

Regression

Root Mean Squared Error

RMSE is probably the most popular formula to measure the error rate of a regression model.
 

Relative Squared Error

Relative squared error (RSE) can be compared between models whose errors are measured in the different units.

 Mean Absolute Error

The mean absolute error (MAE) has the same unit as the original data, and it can only be compared between models whose errors are measured in the same units. It is usually similar in magnitude to RMSE, but slightly smaller.

Relative Absolute Error

Like RSE , the relative absolute error (RAE) can be compared between models whose errors are measured in the different units.

Standardized Residuals (Errors) Plot

Now let's get model

Exporting model

# Export as a Keras Model.
model = clf.export_model()

print(type(model))  # <class 'tensorflow.python.keras.engine.training.Model'>

try:
    model.save("model_autokeras", save_format="tf")
except:
    model.save("model_autokeras.h5")

Importing model

from tensorflow.keras.models import load_model

loaded_model = load_model("model_autokeras", custom_objects=ak.CUSTOM_OBJECTS)

predicted_y = loaded_model.predict(tf.expand_dims(x_test, -1))
print(predicted_y)

Topics for discussion:

  • data storage
  • computation power
  • GPUs vs CPUs
  • cloud services
  • ML pipelines

Any questions?

PV226: AutoML

By Lukáš Grolig

PV226: AutoML

  • 373