What are adversarial examples?

@tiffanysouterre

Developer Relations

@Microsoft

Tiffany Souterre

WTM Ambassador

@WTM

(Ian J. Goodfellow, et al. 2016)

"panda"\\ 57.7\%\: confidence

"gibbon"\\ 99.3 \%\: confidence

+0.007 \times

(Christian Szegedy, et al. 2016)

(Kurakin A., et al. 2017)

(Mahmood Sharif, 2016)

(Anish Athalye, et al. 2018)

28 px

0.00

1.00

0.38

784 neurons

Input layer

Output layer

Hidden layers

(Check out 1blue3brown series on Youtube)

784 neurons

Input layer

Hidden layers

= \color{yellow}\sigma( \color{green}w_{0,0}\: \color{orange}a_0^{(0)}+ \color{green}w_{0,1}\:\color{orange}a_1^{(0)}+...+\color{green}w_{0,n}\:\color{orange}a_n^{(0)}+\color{turquoise}b_0\color{yellow})

\color{orange}a_1^{(0)}

\color{orange}a_2^{(0)}

a_0^{(1)}

a_1^{(1)}

a_2^{(1)}

\color{orange}a_0^{(0)}

\color{green}w_{0,0}

\color{green}w_{0,1}

\color{green}w_{0,2}

= \color{yellow}\sigma( \color{white}w_{1,0}\: \color{orange}a_0^{(0)}+ \color{white}w_{1,1}\: \color{orange}a_1^{(0)} \color{white}+...+w_{1,n}\: \color{orange}a_n^{(0)}+\color{turquoise}b_1 \color{yellow})

= \color{yellow}\sigma( \color{white}w_{2,0}\: \color{orange}a_0^{(0)}+ \color{white}w_{2,1}\: \color{orange}a_1^{(0)} \color{white}+...+w_{2,n}\: \color{orange}a_n^{(0)}+\color{turquoise}b_2 \color{yellow})

\color{yellow}\sigma \begin{pmatrix} \color{white} \begin{bmatrix} \color{green}w_{0,0} & \color{green}w_{0,1} & \dots & \color{green}w_{0,n}\\ w_{1,0} & w_{1,1} & \dots & w_{1,n}\\ \vdots & \vdots & \ddots & \vdots \\ w_{k,0} & w_{k,01} & \dots & w_{k,n}\\ \end{bmatrix} \begin{bmatrix} \color{orange} a_{0}^{(0)}\\ \color{orange}a_{1}^{(0)}\\ \vdots\\ \color{orange}a_{n}^{(0)}\\ \end{bmatrix} + \begin{bmatrix} \color{turquoise}b_{0}\\ \color{turquoise}b_{1}\\ \vdots\\ \color{turquoise}b_{n}\\ \end{bmatrix} \end{pmatrix} \color{white} = \color{yellow}\sigma(\color{green}W \color{orange}a^{(0)} \color{white}+ \color{turquoise}b\color{yellow})

(Check out 1blue3brown series on Youtube)

28 px

784 neurons

Input layer

Output layer

Hidden layers

f(a_0, \:\dots\:, a_{784}) = \begin{bmatrix} y_0\\ \vdots\\ y_9 \end{bmatrix}

(Check out 1blue3brown series on Youtube)

(Chris Olah, 2014)

1 input layer

1 output layer

1 input layer

1 hidden layer

1 output layer

1 input layer

4 hidden layers

1 output layer

https://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

Inception-v3 Architecture

Trained for the ImageNet database (1000 classes)

Input : 299 x 299 x 3

import numpy as np
from keras.preprocessing import image
from keras.applications import inception_v3


img = image.load_img("katoun.png", target_size=(299, 299))


input_image = image.img_to_array(img)

img.show()


input_image /= 255.
input_image -= 0.5
input_image *= 2.


input_image = np.expand_dims(input_image, axis=0)

model = inception_v3.InceptionV3()

predictions = model.predict(input_image)


predicted_classes = inception_v3.decode_predictions(predictions, top=1)
imagenet_id, name, confidence = predicted_classes[0][0]
print("This is a {} with {:.4}% confidence!".format(name, confidence * 100))

0	1	...	297	298
1
⋮
297
298

0	1	...	297	298
1
⋮
297
298

0	1	...	297	298
1
⋮
297
298

This is a tabby with 86.86% confidence!

Inception V3

(Check out Adam Geitgey series)

import numpy as np
from keras.preprocessing import image
from keras.applications import inception_v3
from keras import backend as K
from PIL import Image

model = inception_v3.InceptionV3()


model_input_layer = model.layers[0].input
model_output_layer = model.layers[-1].output


confidence_function = model_output_layer[0, object_type_to_fake]


gradient_function = K.gradients(confidence_function, model_input_layer)[0]


grab_confidence_and_gradients_from_model = K.function([model_input_layer, K.learning_phase()], [confidence_function, gradient_function])

spoon_confidence = 0

while spoon_confidence < 0.98:
    spoon_confidence, gradients = grab_confidence_and_gradients_from_model([hacked_image, 0])

    hacked_image += gradients * learning_rate

    hacked_image = np.clip(hacked_image, original_image - 0.1, original_image + 0.1)
    hacked_image = np.clip(hacked_image, -1.0, 1.0)


img = Image.fromarray(img.astype(np.uint8))
img.save("hacked-image.png")

Inception V3

spoon

0.00

tabby

0.87

(Check out Adam Geitgey series)

This is a tabby with 86.86% confidence!

This is a spoon with 98.65% confidence!

This is a pineapple with 98.93% confidence!

Original white image

Hacked white image

Hacked white image saturated

Discriminator

Real

Fake

Real

Database

Generator

Fake

Generative Adversary Networks GAN

(Goodfellow 2016)

Model based optimization

4.5 years of GAN progress on face generation. https://t.co/kiQkuYULMC https://t.co/S4aBsU536b https://t.co/8di6K6BxVC https://t.co/UEFhewds2M https://t.co/s6hKQz9gLz pic.twitter.com/F9Dkcfrq8l
— Ian Goodfellow (@goodfellow_ian) January 15, 2019