a review
Alvin Chan
Adversarial Attacks
stop sign
90 km/h
Misclassification of image recognition
Face recognition
Object detection
Image segmentation
Reinforcement learning
Genetic algorithm for evasive malicious PDF files
Local search in latent space of MalGan
Reinforcement Learning algorithm where evasion is considered as reward
Mostly used in Computer Vision domain
Uses gradient of the target models to directly perturb pixel values
Optimizing two components:
White-box: Access to architecture & hyperparameters
Black-box: Access to target model’s prediction
Transfer attacks from single or an ensemble of substitute target models
Single-step or iterative
Successful gradient based approaches
FGSM
i-FGSM
R+FGSM
JSMA
C&W
PGD
Evolutionary & genetic algorithm
PDF-Malware evasion
Image misclassification from noisy images
Local search algorithm
Comprehension task using greedy search
Malware evasion
Most defenses are in computer vision domain
Adversarial retraining
Regularization techniques
Certification & Guarantees
Network distillation
Adversarial detection
Input reconstruction
Ensemble of defenses
New model architecture
Regularize model’s confidence in prediction
Adversarial Logit Pairing
Guarantee of adversarial examples within input space
Direct methods are computationally intensive and limited in scope
Convex approximation as an upper bound
Network distillation
Another model is trained on the prediction of a model
Overcome by stronger attacks
Adversarial Detection
Classifies adversarial images from ‘clean’ images
Overcome by including the detector into the attack’s objective function
Input reconstruction
Scrub adversarial images ‘clean’
Overcome by attacks
Ensemble of defenses
Ensemble of models of the above defenses
Can be overcome if the underlying defense is weak
Express degree of certainty:
“Know when they do not know”
Gaussian Process Hybrid Deep Neural Networks
Expresses latent variable as a Gaussian distribution parameters
“Capsule” network for image
New model architecture’s inductive bias
Definition of an adversarial example
Studies limited to Lp in images
No standard definition for discrete domains like NLP
Standard of robustness evaluation
Benchmarks like Cleverhans
Certification & guarantees
Ultimate robust model
Adversarial examples exist whenever there is classification error
NLP
Other neural network architecture
https://slides.com/alvinchan/resilient-ai-6