Optimising Deep Learning for Training and Inference

Soham Chatterjee

Why Optimise?

  • Lesser Training Time means more experiments

  • Faster Inference means less latency

  • Hassle free Deployment

Training

  • Input Pipeline
    • Extract - From Memory
    • Transform - Augmentation or Pre-Processing
    • Load - Send Data to the Model
  • Model

Input Pipeline

  • Batching
  • tf.data API
  • Data Format - NCHW or NHWC
  • Building from source

Model

Inference and Deployment

  • Hardware Dependent Optimisations
  • Software Dependent Optimisations

CPUs or GPUs

  • CPU - RNNs; GPU - CNN
  • Model Parallelism - intra_op and inter_op
  • Data Parallelism and GPU Towers

SSE3, MKL, AVX, AVX2?

  • Different Instruction sets have difference performance
  • MKL is optimised for NCHW
  • Larger batch sizes improves performance

Soham Chatterjee

  • s.chatterjee@saama.com
  • csoham.wordpress.com
  • www.github.com/soham96/Talks

Optimising Deep Learning for Training and Inference

By Soham Chatterjee

Optimising Deep Learning for Training and Inference

  • 649