a review

Alvin Chan

 AI

Secure

Outline

  • Introduction
  • Target Domains
  • Attacks 
  • Defenses
  • Challenges & Discussion

Adversarial Attacks

stop sign

90 km/h

Introduction

  • Deep Learning models are still vulnerable to adversarial attacks despite new defenses
  • Adversarial attacks can be imperceptible to human

Target Domains

  • Computer Vision
  • Natural Language Processing
  • Malware

Computer Vision

  • Mostly studied domain
  • Continuous input space
  • Compatible with gradient-based attacks

Computer Vision

  • Misclassification of image recognition

    • Face recognition

    • Object detection

    • Image segmentation

  • Reinforcement learning

Natural Language Processing

  • Discrete input space
  • Not directly compatible with gradient-based attacks
    • Local search algorithm
    • Reinforcement learning

Malware Detection

  • Discrete input space​
    • Genetic algorithm for evasive malicious PDF files

    • Local search in latent space of MalGan

    • Reinforcement Learning algorithm where evasion is considered as reward

Attacks

  • Direct gradient-based
  • Search-based

Gradient-based Attacks

  • Mostly used in Computer Vision domain

  • Uses gradient of the target models to directly perturb pixel values

Gradient-based Attacks

  • Optimizing two components:

    • Distance between the clean and adversarial input
    • Label prediction of image

Gradient-based Attacks

  • White-box: Access to architecture & hyperparameters

  • Black-box: Access to target model’s prediction

    • Transfer attacks from single or an ensemble of substitute target models

Gradient-based Attacks

  • Trade-off between effectiveness & computational time

Gradient-based Attacks

  • Single-step or iterative

  • Successful gradient based approaches

    • FGSM

      • i-FGSM

      • R+FGSM

    • JSMA

    • C&W

    • PGD

Search-based Attacks

  • Evolutionary & genetic algorithm

    • PDF-Malware evasion

    • Image misclassification from noisy images

  • Local search algorithm

    • Comprehension task using greedy search

    • Malware evasion

Defenses

  • Most defenses are in computer vision domain

  • Adversarial retraining

  • Regularization techniques

  • Certification & Guarantees

  • Network distillation

  • Adversarial detection

  • Input reconstruction

  • Ensemble of defenses

  • New model architecture

Adversarial Retraining

  • Training on adversarial examples
  • Attacks used affects effectiveness
  • Ensemble adversarial training

Regularization Techniques

  • Regularize model’s confidence in prediction

  • Adversarial Logit Pairing

Certification & Guarantees

  • Guarantee of adversarial examples within input space

    • Direct methods are computationally intensive and limited in scope

    • Convex approximation as an upper bound

Other Techniques

  • Network distillation

    • ​​Another model is trained on the prediction of a model

    • Overcome by stronger attacks

  • Adversarial Detection

    • Classifies adversarial images from ‘clean’ images

    • Overcome by including the detector into the attack’s objective function

Other Techniques

  • Input reconstruction

    • Scrub adversarial images ‘clean’

    • Overcome by attacks

  • Ensemble of defenses

    • Ensemble of models of the above defenses

    • Can be overcome if the underlying defense is weak

Uncertainty Modeling

  • Express degree of certainty:

    • “Know when they do not know”

  • Gaussian Process Hybrid Deep Neural Networks

    • Expresses latent variable as a Gaussian distribution parameters

New Model Architectures

  • “Capsule” network for image

  • New model architecture’s inductive bias

Challenges & Discussion

  • Definition of an adversarial example

    • Studies limited to Lp in images

    • No standard definition for discrete domains like NLP

  • Standard of robustness evaluation

    • Benchmarks like Cleverhans

    • Certification & guarantees

Challenges & Discussion

  • Ultimate robust model

    • Adversarial examples exist whenever there is classification error

  • Adversarial attacks and defenses in other domains
    • NLP

    • Other neural network architecture

Cheers!

https://slides.com/alvinchan/resilient-ai-6

Secure AI

By Alvin Chan

Secure AI

  • 696