Secure your AI models against adversarial threats - Python libraries Cleverhans and Foolbox

By

Deya Chatterjee

 

https://deyachatterjee.github.io

 

 

 

Why do we need to protect our AI models?

  • Data is out there publicly, hence vulnerable.
  • Protect our models to protect our data also (confidential data like medical records, esp.)
  • Re-identification, de-identification, anonymization and linkage attacks
  • Data frauds have always been there, but stronger now
  • Adversarial attacks
  • Dangers to users of specific use cases (even fatalities. e.g., in autonomous driving and healthcare)

Now, what is the goal of our talk?

  • Understand adversarial examples and attacks
  • Understand the dangers and why defenses are needed
  • How different Python libraries can be used to craft attacks, defenses, and check for robustness
  • Give an idea about trends in the field
  • Walk through code, demo and snippets
  • Take away (hopefully) an interest in this field and probably more contributors!

Adversarial examples

1. First, see some examples to gain intuition. 

Note that the confidence for the panda label is way lower than the confidence for the gibbon level even!

All three on the right have been classified as ostriches!
Image credits: Adversarial examples for AlexNet by Szegedy et. al (2013) and this amazing book.

Image credits: Brown et. al (2017) and this amazing book.

Banana or toaster? Notice the class labels in the right graph. This is also how physical adversarial attacks are done on SDCs.

Youtube link

Adversarial examples

2. Basic concepts: why it works?

What is the main goal, u say?

Optimize the noise to maximize the error.

 

Basic technicality for generating adversarial examples

  • Think about gradient descent, i.e., how we normally train our NNs (take baby steps in direction of gradient to finally converge to local minima)
  • Now think about loss function: see below
  • Finally, turn the entire thing upside down! (can be said as 'gradient ascent')
//gradient descent

theta' = theta - alpha. d(loss) //where loss is L(X,y,theta) 

x'=x+alpha.d(loss) //generate adversarial examples

To digress a little, let's talk about GANs

  • Generator G + Discriminator D playing off each other
  • the 'adversarial' concept arises from here
  • Inspiration: game theory
  • open problem: relate GANs with adversarial examples (e.g., what if I adversarial attack a GAN model? The 'D' is built as adversarially robust in that scenario, so would it be affected ?)
  • 2 architectures I would specifically talk about: AT-GAN and Cycle-GAN

Source: AT-GAN paper by Wang et al. (Paper link)

Source: AT-GAN paper by Wang et al. (Paper link)

Adversarial attacks

1. Cleverhans

2. Types of attacks

What is Cleverhans?

Python (Tensorflow) library to test ML systems' vulnerability to adversarial examples

 

Or, to install latest version as it is on Github:

 

pip install cleverhans
pip install git+https://github.com/tensorflow/cleverhans.git#egg=cleverhans

Non-targeted attacks

  • Generalized type
  • Make classifier give incorrect prediction, whatever the prediction may be 
  • Just make sure that the actual label is not outputted

Targeted attacks

  • Specialized type
  • Target class: make classifier predict target class
  • more difficult
  • more dangerous (for fraud, etc.)

Based on targets, there are two types of attacks:

Whitebox attacks

  • Complete access to model
  • Model arch, params, gradients
  • Hence, easier to attack
  • More dangerous ; attacker has more power

Blackbox attacks

  • Attacker knows only model o/p or maybe probability/accuracy for a particular task
  • No knowledge about model gradients to tweak it
  • However, they can be AS EFFECTIVE

Based on access to model internalities, there are again two types of attacks:

FGSM

Fast Gradient Sign Method

from cleverhans.attacks import FastGradientMethod
  • The most basic method. Foundation for advanced attacks.
  • Remember the panda->gibbon example? FGSM did that
  • Whitebox attack
//gradient descent

theta' = theta - alpha. d(loss)  // where loss is L(X,y,theta) 

x'=x+alpha.d(loss) // generate adversarial examples

//for FGSM replace alpha by epsilon, which is fixed perturbation per pixel

x'=x+epsilon.d(loss)

L-BFGS

The Limited-memory Broyden-Fletcher-Goldfarb-Shanno attack

  • Slow, but high accuracy.
  • For convolutional neural networks.
  • Whitebox attack.
  • Extension of L-BFGS: Carlini Wagner attack

PGD attack

Projected stochastic gradient descent attack

  • Said to be 'the most complete whitebox adversary': gives attacker unrestrained freedom to launch attack
  • Whitebox attack.
  • BIM (Basic Iterative Method) attack is very important.
  • Commonly used.

Consequences

Are our systems at stake?

There are contrasting views to this.​

Consequences (contd.)

Are our systems at stake?

  • self-driving cars 
  • medical deep learning systems
  • malware classification in file systems
  • basically, should we trust automated decision tools?
  • adversarial audio: security of automated assistants (e.g., Siri)
  • any deployed ML system is at stake.

Yes, they are.

Code demo

Adversarial examples against medical systems

 

Part 1: visual demo

Image credits for following 2 slides: Finlayson et al. and Ma et al. (ArXiv), MIT Media Lab and cyber.harvard.edu

Code demo

Adversarial examples against medical systems

 

Part 2: code snippets

Second code created as a wrapper over https://github.com/sgfin/adversarial-medicine

Code Snippets

Code Snippets

Enough about attacks!

What about defenses?

  • But some studies say that maybe attacking ML models is easier than defending them
  • True, not much research has been done with defenses as they have been with attacks, but we can hope.
  • Some techniques: Defense-GAN, distillation, input transformations and adversarial training.

Defensive distillation

However, it can be defeated :(

  • adds flexibility to an algorithm’s classification process so the model is less susceptible to exploitation

  • one model is trained to predict the output probs of another model that was trained on an earlier, baseline standard

Adversarial training

Main takeaway: augment dataset with adversarial examples to make it more adversarially robust. But it doesn't perform well when attacker takes a different strategy.

In other news.. (cleverhans on twitter)

Like, what about other libs in Python..?

  • advertorch (Pytorch)
  • baidu/AdvBox
  • bethgelab/foolbox
  • IBM/adversarial-examples-toolbox
  • BorealisAI (Pytorch)

What other than Cleverhans?

This week in using Python to combat adversarial examples

This week in using Python to combat adversarial examples (contd.)

This week in using Python to combat adversarial examples (contd.)

To know how noise works, types of noise and models' sensitivity to noise, check out these notebooks!

Interesting references - Part 1 

Possible explanation for why NNs are susceptible to adversarial attacks in the first place

Interesting references - Part 2

Contribute!

  • To our project: ping me on Github/ LinkedIn

  • To Cleverhans ! Check out the issues.

  • Read the contributing guidelines.

  • Follow same method for other libs

  • Find one that matches with your favorite DL framework!

Thank you!

On adversarial examples and Cleverhans

By deya_not_diya

On adversarial examples and Cleverhans

  • 947