Resilient AI

a review

Outline

  • Introduction
  • Target Domains
  • Attacks 
  • Defenses
  • Challenges & Discussion

Introduction

  • Deep Neural Networks are still vulnerable to adversarial attacks despite new defenses
  • Adversarial attacks can be imperceptible to human

Target Domains

  • Computer Vision
  • Natural Language Processing
  • Malware

Computer Vision

  • Mostly studied domain
  • Continuous input space
  • Compatible with gradient-based attacks

Computer Vision

  • Misclassification of image recognition

    • Face recognition

    • Object detection

    • Image segmentation

  • Reinforcement learning

  • Generative modeling

Natural Language Processing

  • Discrete input space
  • Not directly compatible with gradient-based attacks
    • Local search algorithm
    • Reinforcement learning

Malware Detection

  • Discrete input space
    • Adapted JSMA, a gradient-based algorithm

    • Genetic algorithm for evasive malicious PDF files

    • Local search in latent space of MalGan

    • Reinforcement Learning algorithm where evasion is considered as reward

Attacks

  • Direct gradient-based
  • Search-based

Direct Gradient-based Attacks

  • Mostly used in Computer Vision domain

  • Uses gradient of the target models to directly perturb pixel values

Direct Gradient-based Attacks

  • Mechanism lies in optimizing the objective function which contains two components:

      • Distance between the clean and adversarial input: Lp norm distance in direct input space

Direct Gradient-based Attacks

  • Can be applied in a white or black box manner

    • White-box: Adversary has access to 1) target model’s architecture and parameters, 2) training data, 3) training algorithm, 4) hyperparameters

    • Black-box: Adversary might only have access to target model’s prediction (might include confidence level)

      • Transfer attacks from single or an ensemble of substitute target models

Direct Gradient-based Attacks

  • Trade-off between effectiveness (perturbation & misclassification rate) and computational time

Direct Gradient-based Attacks

  • Single-step or iterative

  • Successful gradient based approaches

    • FGSM

      • i-FGSM

      • R+FGSM

    • JSMA

    • C&W

    • PGD

Search-based Attacks

  • Evolutionary & genetic algorithm

    • PDF-Malware evasion

    • Image misclassification from noisy images

  • Local search algorithm

    • Comprehension task using greedily search

    • Malware evasion

    • Translation task with adversarial examples searched from latent space of autoencoder

Defenses

  • Most defenses are in computer vision domain

  • Adversarial retraining

  • Regularization techniques

  • Certification & Guarantees

  • Network distillation

  • Adversarial detection

  • Input reconstruction

  • Ensemble of defenses

  • New model architecture

Adversarial Retraining

  • Strengthen target model by training it on adversarial examples generated by attacks
  • Attacks used affects effectiveness
  • Ensemble adversarial training is effective against black box transfer attacks

Regularization Techniques

  • Regularize the confidence of model’s confidence level in prediction

  • Adversarial Logit Pairing, Clean Logit Pairing, Logit Squeezing

Certification & Guarantees

  • Direct and approximation methods to find certain guarantee of adversarial examples within input space

    • Direct methods are computationally intensive and limited in scope: Reluplex

  • Reformulation of the approximation method can make model more robust

    • Convex approximation as an upper bound to minimize

Other Techniques

  • Network distillation

    • Overcome by stronger attacks

    • Another model is trained on the prediction of a model

  • Adversarial Detection

    • Classifies adversarial images from ‘clean’ images

    • Overcome by including the detector into the attack’s objective function

Other Techniques

  • Input reconstruction

    • Scrub adversarial images ‘clean’

    • Overcome by attacks via expectation of transform

  • Ensemble of defenses

    • Ensemble of models of the above defenses

    • Can be overcome if the underlying defense is weak

Uncertainty Modeling

  • Allows model to express degree of certainty in its prediction: “Know when they do not know”

  • Gaussian Process Hybrid Deep Neural Networks

    • Expresses latent variable as a Gaussian distribution with mean and covariance, encoded in RBF kernels

New Model Architectures

  • “Capsule” network for image which is shown to be more resistant against adversarial examples

  • Model architecture’s inductive bias might better represent the real data distribution

Challenges & Discussion

  • Definition of an adversarial example

    • Studies limited to Lp in images

    • No standard definition for discrete domains like NLP

  • Standard of robustness evaluation

    • Benchmarks like Cleverhans

    • Certification & guarantees

Challenges & Discussion

  • Ultimate robust model

    • Adversarial examples exist whenever there is classification error

  • Adversarial attacks and defenses in other domains
    • NLP

    • Other neural network architecture

Resilient AI

By Alvin Chan

Resilient AI

  • 588