What are adversarial examples?

@tiffanysouterre

Developer Relations

@Microsoft

Tiffany Souterre

WTM Ambassador

@WTM

(Ian J. Goodfellow, et al. 2016)

"panda"\\ 57.7\%\: confidence
"gibbon"\\ 99.3 \%\: confidence
+0.007 \times
=

(Christian Szegedy, et al. 2016)

(Kurakin A., et al. 2017)

(Mahmood Sharif, 2016)

(Anish Athalye, et al. 2018)

28 px

28 px

0.00

1.00

0.38

784 neurons

Input layer

0

1

2 

3 

4 

5 

7 

8 

9

6

Output layer

Hidden layers

784 neurons

Input layer

Hidden layers

= \color{yellow}\sigma( \color{green}w_{0,0}\: \color{orange}a_0^{(0)}+ \color{green}w_{0,1}\:\color{orange}a_1^{(0)}+...+\color{green}w_{0,n}\:\color{orange}a_n^{(0)}+\color{turquoise}b_0\color{yellow})
\color{orange}a_1^{(0)}
\color{orange}a_2^{(0)}
a_0^{(1)}
a_1^{(1)}
a_2^{(1)}
\color{orange}a_0^{(0)}
\color{green}w_{0,0}
\color{green}w_{0,1}
\color{green}w_{0,2}
= \color{yellow}\sigma( \color{white}w_{1,0}\: \color{orange}a_0^{(0)}+ \color{white}w_{1,1}\: \color{orange}a_1^{(0)} \color{white}+...+w_{1,n}\: \color{orange}a_n^{(0)}+\color{turquoise}b_1 \color{yellow})
= \color{yellow}\sigma( \color{white}w_{2,0}\: \color{orange}a_0^{(0)}+ \color{white}w_{2,1}\: \color{orange}a_1^{(0)} \color{white}+...+w_{2,n}\: \color{orange}a_n^{(0)}+\color{turquoise}b_2 \color{yellow})
\color{yellow}\sigma \begin{pmatrix} \color{white} \begin{bmatrix} \color{green}w_{0,0} & \color{green}w_{0,1} & \dots & \color{green}w_{0,n}\\ w_{1,0} & w_{1,1} & \dots & w_{1,n}\\ \vdots & \vdots & \ddots & \vdots \\ w_{k,0} & w_{k,01} & \dots & w_{k,n}\\ \end{bmatrix} \begin{bmatrix} \color{orange} a_{0}^{(0)}\\ \color{orange}a_{1}^{(0)}\\ \vdots\\ \color{orange}a_{n}^{(0)}\\ \end{bmatrix} + \begin{bmatrix} \color{turquoise}b_{0}\\ \color{turquoise}b_{1}\\ \vdots\\ \color{turquoise}b_{n}\\ \end{bmatrix} \end{pmatrix} \color{white} = \color{yellow}\sigma(\color{green}W \color{orange}a^{(0)} \color{white}+ \color{turquoise}b\color{yellow})

28 px

28 px

784 neurons

Input layer

0

1

2 

3 

4 

5 

7 

8 

9

6

Output layer

Hidden layers

f(a_0, \:\dots\:, a_{784}) = \begin{bmatrix} y_0\\ \vdots\\ y_9 \end{bmatrix}

1 input layer

1 output layer

1 input layer

1 hidden layer

1 output layer

1 input layer

4 hidden layers

1 output layer

https://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

Inception-v3 Architecture

Trained for the ImageNet database (1000 classes)

Input : 299 x 299 x 3

import numpy as np
from keras.preprocessing import image
from keras.applications import inception_v3


img = image.load_img("katoun.png", target_size=(299, 299))

input_image = image.img_to_array(img)
img.show()

input_image /= 255.
input_image -= 0.5
input_image *= 2.


input_image = np.expand_dims(input_image, axis=0)
model = inception_v3.InceptionV3()

predictions = model.predict(input_image)


predicted_classes = inception_v3.decode_predictions(predictions, top=1)
imagenet_id, name, confidence = predicted_classes[0][0]
print("This is a {} with {:.4}% confidence!".format(name, confidence * 100))
0 1 ... 297 298
1
297
298
0 1 ... 297 298
1
297
298
0 1 ... 297 298
1
297
298

This is a tabby with 86.86% confidence!

Inception V3

import numpy as np
from keras.preprocessing import image
from keras.applications import inception_v3
from keras import backend as K
from PIL import Image

model = inception_v3.InceptionV3()


model_input_layer = model.layers[0].input
model_output_layer = model.layers[-1].output


confidence_function = model_output_layer[0, object_type_to_fake]


gradient_function = K.gradients(confidence_function, model_input_layer)[0]


grab_confidence_and_gradients_from_model = K.function([model_input_layer, K.learning_phase()], [confidence_function, gradient_function])

spoon_confidence = 0

while spoon_confidence < 0.98:
    spoon_confidence, gradients = grab_confidence_and_gradients_from_model([hacked_image, 0])

    hacked_image += gradients * learning_rate

    hacked_image = np.clip(hacked_image, original_image - 0.1, original_image + 0.1)
    hacked_image = np.clip(hacked_image, -1.0, 1.0)


img = Image.fromarray(img.astype(np.uint8))
img.save("hacked-image.png")

Inception V3

spoon

0.00

tabby

0.87

This is a tabby with 86.86% confidence!

This is a spoon with 98.65% confidence!

This is a pineapple with 98.93% confidence!

Original white image

Hacked white image

Hacked white image saturated

Discriminator

Real

Fake

0

1

Real

Database

Generator

Fake

Generative Adversary Networks GAN

(Goodfellow 2016)

Model based optimization

(Egor Zakharov, et al. 2019)

(Egor Zakharov, et al. 2019)

Thank you!

@tiffanysouterre

(Papernot, et al. 2016)

(Ian J. Goodfellow, 2016)

Tensorflow, there is no spoon

By Tiffany Souterre

Tensorflow, there is no spoon

  • 3,663