Introduction Capsule Networks

1

Alexander Isenko

28.11.2017

In-House presentation @ iteratec

2

Alexander Isenko

28.11.2017

Motivation

Motivation

3

Alexander Isenko

28.11.2017

Default Graphics

Graphics

Instantiation Parameters

Rendering

Image

4

Alexander Isenko

28.11.2017

Inverse Graphics

Graphics

Instantiation Parameters

Inverse Rendering

Image

5

Alexander Isenko

28.11.2017

Capsule Activations

Capsules

Capsule Activations

Image

Activation vector:

Length = estimated probability of presence

Orientation = estimated pose parameteres

1

Alexander Isenko

28.11.2017

Capsule Activations

Capsules

Capsule Activations

Image

Squash(\textbf{u}) =
Squash(u)=Squash(\textbf{u}) =
\frac{||\textbf{u}||^2}{1 + ||\textbf{u}||^2} \cdot \frac{\textbf{u}}{ ||\textbf{u}||}
u21+u2uu\frac{||\textbf{u}||^2}{1 + ||\textbf{u}||^2} \cdot \frac{\textbf{u}}{ ||\textbf{u}||}

Convolutional Layers

+ Reshape

+ Squash

6

Alexander Isenko

28.11.2017

Transformation Matrix

Capsules

\hat{\textbf{u}}_{j|i} = \textbf{W}_{ij}\textbf{u}_i
u^ji=Wijui\hat{\textbf{u}}_{j|i} = \textbf{W}_{ij}\textbf{u}_i
\text{Transformation Matrix } \textbf{W}
Transformation Matrix W\text{Transformation Matrix } \textbf{W}

7

Alexander Isenko

28.11.2017

Routing by Agreement

Capsules

8

Alexander Isenko

28.11.2017

Routing by Agreement

Capsules

\forall i,j: b_{ij} = 0.5
i,j:bij=0.5 \forall i,j: b_{ij} = 0.5
\textbf{c}_{i,j} = \text{softmax}(\textbf{b}_{i,j},\textbf{b}_{i} )
ci,j=softmax(bi,j,bi)\textbf{c}_{i,j} = \text{softmax}(\textbf{b}_{i,j},\textbf{b}_{i} )
0.5
0.50.5
0.5
0.50.5
0.5
0.50.5
0.5
0.50.5
\hat{\textbf{u}}_{j | i}
u^ji\hat{\textbf{u}}_{j | i}
\hat{\textbf{u}}_{j | i}
u^ji\hat{\textbf{u}}_{j | i}
\textbf{s}_{i,j} = \textbf{c}_{i,j} \cdot
si,j=ci,j\textbf{s}_{i,j} = \textbf{c}_{i,j} \cdot
\textbf{v}_{j} = \text{squash}( \textbf{s}_{j})
vj=squash(sj)\textbf{v}_{j} = \text{squash}( \textbf{s}_{j})
\textbf{b}_{i,j} \leftarrow \textbf{b}_{i,j} + \hat{\textbf{u}}_{j | i} \cdot \textbf{v}_{j}
bi,jbi,j+u^jivj\textbf{b}_{i,j} \leftarrow \textbf{b}_{i,j} + \hat{\textbf{u}}_{j | i} \cdot \textbf{v}_{j}

Final output of #1 round

Routing coefficient

Prediction vector

Routing weights

Routing weights

0.9
0.90.9
0.1
0.10.1
0.2
0.20.2
0.8
0.80.8

9

Alexander Isenko

28.11.2017

Crowded Scenes

Capsules

10

Alexander Isenko

28.11.2017

MNIST Classifier

Example

\textbf{Reconstruction} = 1, \textbf{Routing} = 1, \textbf{Error} = 0.29 \pm0.011
Reconstruction=1,Routing=1,Error=0.29±0.011\textbf{Reconstruction} = 1, \textbf{Routing} = 1, \textbf{Error} = 0.29 \pm0.011
\textbf{Reconstruction} = 0, \textbf{Routing} = 3, \textbf{Error} = 0.35 \pm0.036
Reconstruction=0,Routing=3,Error=0.35±0.036\textbf{Reconstruction} = 0, \textbf{Routing} = 3, \textbf{Error} = 0.35 \pm0.036
\textbf{Reconstruction} = 0, \textbf{Routing} = 1, \textbf{Error} = 0.34 \pm0.032
Reconstruction=0,Routing=1,Error=0.34±0.032\textbf{Reconstruction} = 0, \textbf{Routing} = 1, \textbf{Error} = 0.34 \pm0.032
\textbf{Reconstruction} = 1, \textbf{Routing} = 3, \textbf{Error} = 0.25 \pm0.005
Reconstruction=1,Routing=3,Error=0.25±0.005\textbf{Reconstruction} = 1, \textbf{Routing} = 3, \textbf{Error} = 0.25 \pm0.005
\textbf{Baseline} = 0.39
Baseline=0.39\textbf{Baseline} = 0.39

11

Alexander Isenko

28.11.2017

MNIST Latent Variables

Example

12

Alexander Isenko

28.11.2017

Takeaway

Pros

  • Routing by agreement is an innovative concept
    • allows for better classification of crowded scenes
    • does not throw away information in comp. to MaxPooling
  • Activation of the capsules is the vector length
  • Learning the transformation matrices
    • scale, rotation
    • not translation, that's via low level capsules (CNN)
  • Activation vectors are interpretable
    • creation of hierarchy
  • Parse tree for "every fixation" point
  • Requires less data

13

Alexander Isenko

28.11.2017

Takeaway

Pros

\textbf{u}_{i}
ui\textbf{u}_{i}
\cdot \textbf{W}_{i,j}
Wi,j\cdot \textbf{W}_{i,j}
\hat{\textbf{u}}_{j | i}
u^ji\hat{\textbf{u}}_{j | i}
\textbf{u}_{i}
ui\textbf{u}_{i}
=
==
\hat{\textbf{u}}_{j | i}
u^ji\hat{\textbf{u}}_{j | i}
= squash(\textbf{s}_j)
=squash(sj)= squash(\textbf{s}_j)
\textbf{v}_{j}
vj\textbf{v}_{j}

Routing

Capsule

output

Routing coefficient

Capsule activation

Transformation Matrix

Prediction vector

c_{i,j}
ci,jc_{i,j}
\cdot \hat{\textbf{u}}_{j | i}
u^ji\cdot \hat{\textbf{u}}_{j | i}
\textbf{s}_j
sj\textbf{s}_j
\sum
\sum
\textbf{v}_j
vj\textbf{v}_j

14

Alexander Isenko

28.11.2017

Takeaway

  • CIFAR10 was not state of the art
  • Does it scale? Imagenet?
  • Slow training (routing agreement)
  • Can't see two close objects next to each other

Cons

15

Alexander Isenko

28.11.2017

Sources