Resilient AI
a review
Outline
- Introduction
- Target Domains
- Attacks
- Defenses
- Challenges & Discussion
Introduction
- Deep Neural Networks are still vulnerable to adversarial attacks despite new defenses
- Adversarial attacks can be imperceptible to human
Target Domains
- Computer Vision
- Natural Language Processing
- Malware
Computer Vision
- Mostly studied domain
- Continuous input space
- Compatible with gradient-based attacks
Computer Vision
-
Misclassification of image recognition
Face recognition
Object detection
Image segmentation
Reinforcement learning
Generative modeling
Natural Language Processing
- Discrete input space
- Not directly compatible with gradient-based attacks
- Local search algorithm
- Reinforcement learning
Malware Detection
- Discrete input space
-
Adapted JSMA, a gradient-based algorithm
-
Genetic algorithm for evasive malicious PDF files
-
Local search in latent space of MalGan
-
Reinforcement Learning algorithm where evasion is considered as reward
-
Attacks
- Direct gradient-based
- Search-based
Direct Gradient-based Attacks
-
Mostly used in Computer Vision domain
-
Uses gradient of the target models to directly perturb pixel values
Direct Gradient-based Attacks
-
Mechanism lies in optimizing the objective function which contains two components:
-
-
Distance between the clean and adversarial input: Lp norm distance in direct input space
-
-
Direct Gradient-based Attacks
-
Can be applied in a white or black box manner
-
White-box: Adversary has access to 1) target model’s architecture and parameters, 2) training data, 3) training algorithm, 4) hyperparameters
-
Black-box: Adversary might only have access to target model’s prediction (might include confidence level)
-
Transfer attacks from single or an ensemble of substitute target models
-
-
Direct Gradient-based Attacks
-
Trade-off between effectiveness (perturbation & misclassification rate) and computational time
Direct Gradient-based Attacks
-
Single-step or iterative
-
Successful gradient based approaches
-
FGSM
-
i-FGSM
-
R+FGSM
-
-
JSMA
-
C&W
-
PGD
-
Search-based Attacks
-
Evolutionary & genetic algorithm
-
PDF-Malware evasion
-
Image misclassification from noisy images
-
-
Local search algorithm
-
Comprehension task using greedily search
-
Malware evasion
-
Translation task with adversarial examples searched from latent space of autoencoder
-
Defenses
-
Most defenses are in computer vision domain
-
Adversarial retraining
-
Regularization techniques
-
Certification & Guarantees
-
Network distillation
-
Adversarial detection
-
Input reconstruction
-
Ensemble of defenses
-
New model architecture
Adversarial Retraining
- Strengthen target model by training it on adversarial examples generated by attacks
- Attacks used affects effectiveness
- Ensemble adversarial training is effective against black box transfer attacks
Regularization Techniques
-
Regularize the confidence of model’s confidence level in prediction
-
Adversarial Logit Pairing, Clean Logit Pairing, Logit Squeezing
Certification & Guarantees
-
Direct and approximation methods to find certain guarantee of adversarial examples within input space
-
Direct methods are computationally intensive and limited in scope: Reluplex
-
-
Reformulation of the approximation method can make model more robust
-
Convex approximation as an upper bound to minimize
-
Other Techniques
-
Network distillation
-
Overcome by stronger attacks
-
Another model is trained on the prediction of a model
-
-
Adversarial Detection
-
Classifies adversarial images from ‘clean’ images
-
Overcome by including the detector into the attack’s objective function
-
Other Techniques
-
Input reconstruction
-
Scrub adversarial images ‘clean’
-
Overcome by attacks via expectation of transform
-
-
Ensemble of defenses
-
Ensemble of models of the above defenses
-
Can be overcome if the underlying defense is weak
-
Uncertainty Modeling
-
Allows model to express degree of certainty in its prediction: “Know when they do not know”
-
Gaussian Process Hybrid Deep Neural Networks
-
Expresses latent variable as a Gaussian distribution with mean and covariance, encoded in RBF kernels
-
New Model Architectures
-
“Capsule” network for image which is shown to be more resistant against adversarial examples
-
Model architecture’s inductive bias might better represent the real data distribution
Challenges & Discussion
-
Definition of an adversarial example
-
Studies limited to Lp in images
-
No standard definition for discrete domains like NLP
-
-
Standard of robustness evaluation
-
Benchmarks like Cleverhans
-
Certification & guarantees
-
Challenges & Discussion
-
Ultimate robust model
-
Adversarial examples exist whenever there is classification error
-
-
Adversarial attacks and defenses in other domains
-
NLP
-
Other neural network architecture
-
Resilient AI
By Alvin Chan
Resilient AI
- 588