a review
Misclassification of image recognition
Face recognition
Object detection
Image segmentation
Reinforcement learning
Generative modeling
Adapted JSMA, a gradient-based algorithm
Genetic algorithm for evasive malicious PDF files
Local search in latent space of MalGan
Reinforcement Learning algorithm where evasion is considered as reward
Mostly used in Computer Vision domain
Uses gradient of the target models to directly perturb pixel values
Mechanism lies in optimizing the objective function which contains two components:
Distance between the clean and adversarial input: Lp norm distance in direct input space
Can be applied in a white or black box manner
White-box: Adversary has access to 1) target model’s architecture and parameters, 2) training data, 3) training algorithm, 4) hyperparameters
Black-box: Adversary might only have access to target model’s prediction (might include confidence level)
Transfer attacks from single or an ensemble of substitute target models
Trade-off between effectiveness (perturbation & misclassification rate) and computational time
Single-step or iterative
Successful gradient based approaches
FGSM
i-FGSM
R+FGSM
JSMA
C&W
PGD
Evolutionary & genetic algorithm
PDF-Malware evasion
Image misclassification from noisy images
Local search algorithm
Comprehension task using greedily search
Malware evasion
Translation task with adversarial examples searched from latent space of autoencoder
Most defenses are in computer vision domain
Adversarial retraining
Regularization techniques
Certification & Guarantees
Network distillation
Adversarial detection
Input reconstruction
Ensemble of defenses
New model architecture
Regularize the confidence of model’s confidence level in prediction
Adversarial Logit Pairing, Clean Logit Pairing, Logit Squeezing
Direct and approximation methods to find certain guarantee of adversarial examples within input space
Direct methods are computationally intensive and limited in scope: Reluplex
Reformulation of the approximation method can make model more robust
Convex approximation as an upper bound to minimize
Network distillation
Overcome by stronger attacks
Another model is trained on the prediction of a model
Adversarial Detection
Classifies adversarial images from ‘clean’ images
Overcome by including the detector into the attack’s objective function
Input reconstruction
Scrub adversarial images ‘clean’
Overcome by attacks via expectation of transform
Ensemble of defenses
Ensemble of models of the above defenses
Can be overcome if the underlying defense is weak
Allows model to express degree of certainty in its prediction: “Know when they do not know”
Gaussian Process Hybrid Deep Neural Networks
Expresses latent variable as a Gaussian distribution with mean and covariance, encoded in RBF kernels
“Capsule” network for image which is shown to be more resistant against adversarial examples
Model architecture’s inductive bias might better represent the real data distribution
Definition of an adversarial example
Studies limited to Lp in images
No standard definition for discrete domains like NLP
Standard of robustness evaluation
Benchmarks like Cleverhans
Certification & guarantees
Ultimate robust model
Adversarial examples exist whenever there is classification error
NLP
Other neural network architecture