Le Deep Learning en pratique avec Keras

Toulouse Data Science
18 Juin 2018

Florient Chouteau

About.md

Deep Learning Engineer at Magellium

... training neural networks since 2016

... mostly aerial & satellite imagery

Florient Chouteau

We have one hour.

let's talk about...

... Deep Learning for computer vision ...

... in pratice ...

... using Keras ...

WTF is... Deep Learning ?

Machine Learning and "Deep" Learning

Representation Learning

Hierarchical Representation Learning

Differentiable programming

# Define the neural network function y = x * w
def nn(x, weights): 
    return x * weights

# Define the cost function
def loss(y_true, y_pred): 
    return ((y_pred - y_true)**2).sum()

# define the gradient function.
def gradient(weights, x, y_true): 
    return 2 * x * (nn(x, weights) - y_true)

# define the update function delta w
def delta_w(w_k, x, y_true):
    dw =  gradient(w_k, x, y_true).sum()
    return dw

# Start performing the gradient descent updates
for i in range(nb_of_iterations):
    # Get the delta w update
    dw = delta_w(w, x, y_true, learning_rate)  
    # Update the current weight parameter
    w = w - learning_rate * dw  

# Et voilà !
  1. Express problem as computational graph
  2. Define differentiable loss function
  3. Optimize with gradient descent

In practice: Neural Networks

Stacking of:

  • linear units (fully connected to inputs)
  • non linear activation function 

Convolutionnal Neural Networks

Use Cases

A LOT of things !

  • Most drastic changes in the way we do pattern recognition...
  • Natural language processing, translation...
  • Reinforcement learning (Go... Starcraft soon ?)

Examples in CV:

Practical Deep Learning

So you have a DL project...

Translate biz. obj. into ML obj. 

Get data

Define model

& training procedure

Build data pipeline

model.train()

Evaluate

Deploy

...then you need a DL framework...

  • Abstraction of hardware (CPU, GPU, TPU, Distributed....)
  • Data transformations & pipeline
  • Model declaration
  • Autograd/Autodiff
  • Mostly so you don't go insane rewriting convs in CUPY

... that suits you well

  • Ease of use / Low-levelness trade-off
  • Accessibility / Scalability trade-off
  • Adepted to your background

(please don't get offended if I forget one)

And also Matconvnet, DL4J, Dlib, and some more

static vs dynamic graphs

Pytorch

Keras

"Deep Learning for Humans"

What is Keras

  • High level Python API for Deep Learning
  • Same code that abstracts several backends: Theano (first one), Tensorflow, CNTK, MXnet (fork), PlaidML (soon)
  • Created in mid 2015 by François Chollet @ Google

  • One of the most popular way to do Deep Learning

  • Currently in its version 2.2.0

(Pas l'adjoint au maire)

Why (still) talk about Keras ?

  • One of the only way of doing DL for actual humans
  • "Beginner only" reputation... unfounded ?
  • Huge impact in terms of UX over other frameworks
  • It's very likely you don't need more to solve your problem

(This is a personal opinion)

Thesis for what's next

If you are in an "humanly scaled" (e.g. not Google, not Facebook...) team or project, Keras is "the best" trade-off between acessibility, ease of use, extensibility, ability to scale up to bigger data and ability to ship models to production environments.

1. Accessibility

First steps into Deep Learning

Installing Keras

# Install your backend

pip install tensorflow==1.8.0 

# (or tensorflow-gpu==1.8.0 or theano, or cntk...)


# Install keras

pip install keras==2.2.0

Note: I will not talk about other backends than tf because... I don't know anything about them.

Why Keras was built in one example

# Tensorflow Low Level API

# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, num_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, keep_prob: 0.8})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], 
                            feed_dict={X: batch_x,
                                       Y: batch_y,
                                       keep_prob: 1.0})

            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

So you did a data science project in scikit learn and you want to do DL ?

y = df2["Target"]
X = df2[features]
dt = DecisionTreeClassifier(min_samples_split=20, random_state=99)
dt.fit(X, y)

Why Keras was built in one example

# Tensorflow Estimator High Level API
model = tf.estimator.Estimator(model_fn)

# Define the input function for training
input_fn = tf.estimator.inputs.numpy_input_fn(
                x={'images': mnist.train.images}, 
                y=mnist.train.labels,
                batch_size=batch_size, num_epochs=None, shuffle=True)

# Train the Model - Here you see that we wrap functions to objects....
model.train(input_fn, steps=num_steps)

# Evaluate the Model

# Define the input function for evaluating
input_fn = tf.estimator.inputs.numpy_input_fn(
                x={'images': mnist.test.images}, 
                y=mnist.test.labels,
                batch_size=batch_size, shuffle=False)

# Use the Estimator 'evaluate' method
e = model.evaluate(input_fn)

Why Keras was built in one example


# Keras

model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                    validation_data=(x_test,
                                     y_test),
                    epochs=10, 
                    verbose=1,
                    workers=4,
                    callbacks=[TerminateOnNaN()])

Model/Layer abstraction

  • Static graphs are a good idea
  • Writing static graphs in tf/th is painful
  • Keras makes graph ops (functions) as objects from start (Layers)
  • Still, it's very hardcore to debug anything that fails at the graph level

Cleaner, more intuitive API

# Calculate crossentropy

categorical_crossentropy(y_true, y_pred)

# it's like scikit learn !




# Calculate mean of tensor

keras.backend.mean(x, 
                   axis=None, 
                   keepdims=False)

# it's like np.mean(x) !
# Calculate crossentropy

tf.nn.softmax_cross_entropy_with_logits(
    labels=None, 
    logits=None, # WTF is a logit ?
    dim=-1,      # EVERYBODY IS USING AXIS
    name=None)


# Calculate mean of tensor

tf.reduce_mean(      # Ok why not just mean ? 
    input_tensor,
    axis=None,
    keepdims=None, 
    name=None,
    reduction_indices=None,# Axis & Reduction indices -> one of them is deprecated...
    keep_dims=None) # Two keep_dims -> One of them is deprecated...

# have you heard of **kwargs ?
  • Writing in Tensorflow is not writing Python
  • numpy is the reference in array/tensor manipulation
  • scikit-learn is the reference in python-based data science

Example use case: Data

|_ dataset
    |_ train
        |_ cats
            |_ cats_01.jpg
        |_ dogs
            |_ cats_01.jpg
    |_ test
        |_ cats
            |_ cats_01.jpg
        |_ dogs
            |_ cats_01.jpg

Example use case: I/O pipeline

Keras has nice input pipeline classes

from keras.preprocessing.image import ImageDataGenerator

# Define a data generator with augmentation for train
train_data_generator = ImageDataGenerator(
    rotation_range=90.,
    zoom_range=0.1,
    channel_shift_range=0.1,
    fill_mode='nearest',
    horizontal_flip=True,
    vertical_flip=True,
    rescale=1. / 255.,
)

# Get a generator w/ augmented Data
train_generator = train_data_generator.flow_from_directory(
    TRAIN_DIR,
    target_size=input_shape[:2],
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=True,
    seed=2018)

Example use case: Model

def my_first_model(input_shape, num_classes):
    # We use Sequential API to build this model

    model = Sequential()

    # Feature extractor 
    model.add(Conv2D(32, (3, 3), padding="same", 
            input_shape=input_shape))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Conv2D(32, (3, 3), padding="same"))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    (...)

    # Feature bottleneck
    model.add(Flatten())

    # Classifier part
    model.add(Dense(64))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes))
    model.add(Activation('softmax'))

    return model


model = my_first_model(input_shape, num_classes)

Example use case: Fit

from keras.optimizers import SGD

optimizer = SGD(lr=learning_rate, momentum=0.9, nesterov=True)

model.compile(
    loss='categorical_crossentropy',
    optimizer=optimizer,
    metrics=['categorical_accuracy'])

model.fit_generator(
    train_generator,
    steps_per_epoch=train_samples // batch_size,
    epochs=max_epochs,
    validation_data=valid_generator,
    validation_steps=valid_samples // batch_size,
    verbose=2)

Epoch 1/20
 - 20s - loss: 1.4208 - categorical_accuracy: 0.4743 - val_loss: 1.1377 - val_categorical_accuracy: 0.5941
Epoch 2/20
 - 19s - loss: 1.2167 - categorical_accuracy: 0.5628 - val_loss: 1.6693 - val_categorical_accuracy: 0.4429
Epoch 3/20
 - 19s - loss: 1.1094 - categorical_accuracy: 0.6151 - val_loss: 1.1680 - val_categorical_accuracy: 0.5710
Epoch 4/20
 - 18s - loss: 1.0156 - categorical_accuracy: 0.6482 - val_loss: 0.9768 - val_categorical_accuracy: 0.6467
Epoch 5/20
 - 18s - loss: 0.9369 - categorical_accuracy: 0.6791 - val_loss: 0.8757 - val_categorical_accuracy: 0.6658
Epoch 6/20
 - 19s - loss: 0.8891 - categorical_accuracy: 0.6953 - val_loss: 0.7800 - val_categorical_accuracy: 0.7062
Epoch 7/20
 - 18s - loss: 0.8277 - categorical_accuracy: 0.7112 - val_loss: 0.6462 - val_categorical_accuracy: 0.7843

(...)

Example use case: Evaluate w/ sklearn

# Generate predictions for the test set
y_pred, y_true = predict_generator_with_y_true(
    model, test_generator, steps=len(test_generator))


# Go to sklearn. Seemless due to the closeness of the API

from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score

for i in range(num_classes):
    precision[i], recall[i], _ = 
                    precision_recall_curve(
                               y_true[:, i], 
                               y_pred[:, i])
    average_precision[i] = average_precision_score \
        (y_true[:, i], y_pred[:, i])

2. Extensibility

Using Keras every day

Keras main paradigm

Defining your own input pipeline

class Sequence(object):

    def __getitem__(self, index):
        """Gets batch at position `index`.

        # Returns
            A batch
        """

    def __len__(self):
        """
        # Returns
            The number of batches in the Sequence.
        """

    def on_epoch_end(self):
        """Method called at the end of every epoch.
        """
        pass
  • ImageDataGenerator is nice for classification
  • need more things (detection, gans, segmentation) ? DIY !
  • "Sequence" class for feeding data w/ prefetching
  • Pure python: You can use custom imreads (libvips, opencv, rastio), custom augmentations (imgaug)

Layers as function: functional API

def identity_block(input_tensor, kernel_size, filters, stage, block):
    x = layers.Conv2D(filters1, (1, 1))(input_tensor)
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)

    x = layers.Conv2D(filters2, kernel_size,padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)

    x = layers.Conv2D(filters3, (1, 1))(x)
    x = layers.BatchNormalization()(x)

    x = layers.add([x, input_tensor])
    x = layers.Activation('relu')(x)
    return x

Models as Layers: Finetuning

from keras.applications import VGG16

def finetuning(input_shape, num_classes):
    feature_extractor = VGG16(include_top=False, 
        input_shape=input_shape)
    
    inputs = Input(input_shape)

    # Feature extraction with VGG16 trained on ImageNet
    x = feature_extractor(inputs)

    # Classifier
    x = Flatten()(x)
    x = Dense(512)(x)
    x = Activation("relu")(x)
    x = Dropout(0.5)(x)
    x = Dense(num_classes)(x)
    outputs = Activation("softmax")(x)
    
    model = Model(inputs=inputs, outputs=outputs)

    return model

model = finetuning(input_shape, num_classes)

Callbacks: Controlling the loop

class Callback(object):
    def __init__(self):
        pass

    def set_params(self, params):
        self.params = params

    def set_model(self, model):
        self.model = model

    def on_epoch_begin(self, epoch, logs=None):
        pass

    def on_epoch_end(self, epoch, logs=None):
        pass

    def on_batch_begin(self, batch, logs=None):
        pass

    def on_batch_end(self, batch, logs=None):
        pass

    def on_train_begin(self, logs=None):
        pass

    def on_train_end(self, logs=None):
        pass
  • Callbacks are the main control tool in Keras
  • You can write pretty much anything (callbacks are outside the graph)

 

Examples:

Model checkpoint (save best)

LR Scheduler, Cyclic LR

Hard Negative Mining

Custom Eval (PR curves?)

Writing custom in-graph ops

class DiceLoss(object):
    def __init__(self, smooth=1.):
        self.smooth = smooth

    def __call__(self, y_true, y_pred):
        smooth = K.epsilon()

        # Sum over the pixels
        union = K.sum(y_pred + y_true, axis=(1, 2))
        intersection = K.sum(y_pred * y_true, axis=(1, 2))

        # Per sample per class IoU
        iou = (2 * intersection + smooth) / (union + smooth)

        iou = smooth * (1. - iou)  

        return iou
  • Keras is an API front end, so you can write low level functions if you want
  • Either use existing "exposed" backend (K. !) function or expose them yourself

Metrics: Monitoring the loop

class BinaryAccuracy(object):
    def __init__(self):
        pass

    def __call__(y_true, y_pred):
        accuracy = K.equal(y_true, K.round(y_pred)
        return K.mean(accuracy)

class BinaryTruePositives(Layer):
    def __init__(self):
        self.stateful = True
        self.true_positives = 0

    def __call__(y_true, y_pred):
        y_pred = K.round(y_pred)
        correct_preds = K.equal(y_pred, y_true)
        true_pos = K.sum(correct_preds * y_true)

        current_true_pos = self.true_positives * 1

        self.add_update(K.update_add(self.true_positives,
                                     true_pos),
                        inputs=[y_true, y_pred])

        return current_true_pos + true_pos
  • Metrics are written like losses: f(y_true,y_pred)
     
  • They can be stateless (averaged over batch) or stateful

 

 

Loss behaves the same way !

Layers: Wrapping ops and graphs


class BilinearUpSampling2D(Layer):
    def __init__(self, size=(2, 2)):
        super(BilinearUpSampling2D, self).__init__()
        self.size = size
        # + Some params

    def compute_output_shape(self, input_shape):
        # Basically returns 2x input shape

    def call(self, inputs, **kwargs):
        # Call a TF ops
        return tf.image.resize_images_bilinear
                        (inputs, 
                         self.size[0], 
                         self.size[1],
                         self.data_format)

    def get_config(self):
        # Serialize Layer
  • Keras abstracts graph ops as objects.
  • So if you have specific needs and know how to code in TF you can write a Layer and use it in Keras

3. "Scalability"

Going further with Keras

tf.

So what about tensorflow ?

  • Late 2017 tf.keras was announced
  • TF's own high-level API tf.data and tf.estimators were released
  • Keras forked into tf.keras and "keras community edition"
  • Latests commits of Keras teasing like tf.eager
  • Latest releases of tf relying more and more on Keras API
    (Example: Migration of tf.layers API to keras.layers in tf 1.9)

What do I gain by using tf.keras ?

  • Optimization (no multi backend support to keep)
  • tf.Eager support (dynamic graphs)
  • tf.Estimator and tf.Data compatibility
  • Last but not least:

The immense bliss of having absolutely no documentation about what is different*

*Because "yes it's ready" since tf 1.4 but the programmer's guide to keras is still WIP in tf 1.9 !

Oh ye of little faith:

I see you don't like TF very much...

  • Yes
  • It's totally undeserved, sorry Tensorflow
  • OK, Let's do it:
    (this is kinda new: 1.9rc0 from 7th June)
pip install tensorflow-gpu==1.9.0rc1

Using tf.data

import tensorflow as tf
from tensorflow import keras

def parse_function(filename, label):
    image_string = tf.read_file(filename)

    # Don't use tf.image.decode_image, or the output shape will be undefined
    image = tf.image.decode_jpeg(image_string, channels=3)

    # This will convert to float values in [0, 1]
    image = tf.image.convert_image_dtype(image, tf.float32)

    return image, label

def train_preprocess(image, label):
    # Image data augmentation coded with tf.image.random*
    return image, label
# Let train_images, train_labels the list of image filename and label

train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
train_dataset = train_dataset.shuffle(len(train_images))
train_dataset = train_dataset.map(parse_function, num_parallel_calls=4)
train_dataset = train_dataset.map(train_preprocess, num_parallel_calls=4)
train_dataset = train_dataset.batch(batch_size)
train_dataset = train_dataset.prefetch(1)
train_dataset = train_dataset.repeat()

Using tf.data

h = model.fit(
    train_dataset,
    steps_per_epoch=train_samples // batch_size,
    epochs=max_epochs,
    validation_data=val_dataset,
    validation_steps=valid_samples // batch_size,
    verbose=1)

Before:

Let's run:

After:

Using tf.estimators

def input_fn():
    return train_dataset

estimator = keras.estimator.model_to_estimator(model)

estimator.train(input_fn=input_fn)
INFO:tensorflow:loss = 0.80180705, step = 1
INFO:tensorflow:global_step/sec: 211.485
INFO:tensorflow:global_step/sec: 161.321
INFO:tensorflow:loss = 1.25377, step = 201 (1.093 sec)
INFO:tensorflow:global_step/sec: 319.681
INFO:tensorflow:global_step/sec: 320.532
INFO:tensorflow:loss = 0.71999, step = 401 (0.625 sec)
INFO:tensorflow:global_step/sec: 317.201
INFO:tensorflow:global_step/sec: 319.046
INFO:tensorflow:loss = 1.009802, step = 601 (0.629 sec)

Using tf.estimators

  • Faster ! 
  • You gain auto-distribution of training pipeline (yay!) and easy export to model servers
  • You lose callbacks, metrics and all nice things
  • You need to write pure tensorflow ops (notably for eval)

 

Trade off depending on use case & qty of data

 

(not much xp on this...)

Distributed

  • tf.Estimators are nice but there are alternatives
  • Keras provides basic single node multi gpu 
  • Otherwise Horovod:
# Horovod: adjust learning rate based on number of GPUs.
opt = keras.optimizers.SGD(lr=learning_rate * hvd.size(), momentum=0.9)

# Horovod: add Horovod Distributed Optimizer.
opt = hvd.DistributedOptimizer(opt)


callbacks = [hvd.callbacks.BroadcastGlobalVariablesCallback(0)...]

model.fit_generator(train_iter,(...)

Let's finish

(because I'm probably late at this point)

def save_as_tensorflow_serving(path_to_model, version="1"):

    K.clear_session()

    # Set ourselves in inference mode
    K.set_learning_phase(0)
    with K.get_session() as sess:
        model = keras.models.load(path_to_model)

        # Build the Protocol Buffer SavedModel at 'export_path'
        builder = tf.saved_model.builder.SavedModelBuilder(export_path)

        # Specifically name the output_node so that we can get it later
        pred = tf.identity(model.output, name="output_node")

        # Create prediction signature to be used by TensorFlow Serving Predict API
        signature = tf.saved_model.signature_def_utils.predict_signature_def(
            inputs={"images": model.input},
            outputs={"scores": pred},
        )

        # Save the meta graph and the variables
        builder.add_meta_graph_and_variables(
            sess=sess,
            tags=[tf.saved_model.tag_constants.SERVING],
            signature_def_map={
                "predict": signature
            })

        builder.save()

One more thing...

Exporting to production (in 30 secs)

  • Keras models are "portable": You don't need the code declaring it to load it*
  • With tf backend:
    • convert keras models to tensorflow inference graphs (for tf.serving or just tf)
    • apply optimizations (freezing, quantitization etc...)
    • Theoretically you could even train as Keras Model, convert to tf.Estimator and use tf to export to inference graph

*except your custom layers

Wrapping things up

 (for real)

Use (tf.)Keras ...

... for its frictionless learning curve

... for its scikit-learn-like API

... for its Model/Layer abstraction

... for a modern tf usage

... because you only have to solve problems that can be expressed as static graphs

... because you like model portability w/o code

... because you need/like model servers such as tf.serving / tf inference

... because this is probably the most mature high level API for now

... because in the end, Occam's razor

...Prefer pytorch...

. .. if you think static graphs aren't as intuitive (and a pain to debug) as dynamic graphs

... if you want something even more pythonic

... if you don't need model servers and just like to serve predictions with a Flask App

... if you have a raw numpy/cupy background

... if you need new layers with custom  CUDA kernels

... because you are betting that ONNX and Caffe2 merge will work (!)

... if you think Keras is too slow and have too much overhead

Coming soon at TDS ?

(and Ignite)

Thank you

Bonus slide 1

  • Tensorflow (king of static graphs) is currently developping
    • tf.Eager: Dynamic graphs
    • tf.Autograph: tf.Eager to graph
  • Pytorch (king of dynamic graphs) is currently developping
    • Caffe2 merge for dynamic to static
    • ONNX for model portability

Bonus slide 2

Le Deep Learning en pratique avec Keras

By Florient Chouteau

Le Deep Learning en pratique avec Keras

  • 2,567