Machine learning Threat Modeling

Threat Modeling ✨Magic✨

Agenda

Threat Modeling Recap

ML Recap

Threat Modeling ML

"Threat modeling is a process of identifying potential security threats and vulnerabilities in a system, application, or network infrastructure. It involves a systematic approach to identify, prioritize, and mitigate security risks. The goal of threat modeling is to understand the security posture of a system and to develop a plan to prevent or minimize the impact of potential attacks."

Threat Modeling RECAP

1

Defining the scope. (What are we working on?)

2

Building a model of the system.
(Dataflow diagram)

3

Thinking on what can go wrong.

(Threat scenarios)

5

Review and Iterate on what we could do better.

(Bypasses and defense in depth)

4

Design what can we do about it.

(Mitigations)

Name	Description
STRIDE	Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege
DREAD	Damage, Reproducibility, Exploitability, Affected Users, and Discoverability
PASTA	Process for Attack Simulation and Threat Analysis
CAPEC	Common Attack Pattern Enumeration and Classification

Frameworks

"Machine learning is a subfield of artificial intelligence that involves developing algorithms and statistical models that enable computer systems to learn and improve from experience without being explicitly programmed."

Machine Learning RECAP

Data

Supervised Learning

Most common

Trained on labeled data

Make predictions

Unsupervised Learning

Trained on unlabeled data

Identify clusters of similarity

Reduce data dimensionality

Semi-Supervised Learning

Mix of previous two

Improved performance

Types of Learning

Performance != Speed || Memory Consumption

Reinforcement Learning

Based on feedback

Rewards system

Commonly seen on game bots

Deep Learning

Deep neural networks

multiple layers of nodes

Seen on image and speech recognition, natural processing, recommendations

Transfer Learning

Based on pre-trained models

Finetuning for performance

Types of Models

Name	Description
Decision Trees	Split the data into smaller subsets and recursively repeat the process, creating a tree of decisions
Random Forests	Combination of multiple decision trees
SVMs (Support Vector Machines)	Classification task by separating data into different classes (finding hyperplanes)
Naive Bayes	Probabilistic algorithm for classification. Computes probability of data belonging to a given class

Techniques

ML Threat Modeling

How is it trained?

We need to understand what kind of learning will the model use and what kind of data.

How is it called?

We need to understand what kind of usage will this model have. Who is calling it and from where.

How is it deployed?

We need to understand where is this model running and with what supporting code.

Adversarial Perturbation

The attacker stealthily modifies the query to get a desired response. Breach of model input integrity, leads to fuzzing-style attacks that compromise the model’s classification performance

Adversarial Perturbation

Classification Attacks

POison Attacks

mODEL iNVERSION Attacks

mODEL sTEALING Attacks

rEPROGRAMING Attacks

sUPPLY CHAIN Attacks

tHIRD-PARTY Attacks

membership inference Attacks

On Classification attacks an attacker attempts to deceive the model by manipulating the input data. The attacker aims to misclassify the input data in a way that benefits them.

cLASSIFICATION aTTACKS

We have multiple sub-types of this attack:

Target Misclassification

Source/Target Misclassification

Random Misclassification

Confidence Reduction

Generate a sample that is not the input class of the target classifier but gets classified as such by the model.

The sample can appear like a random noise to human eyes but attackers have some knowledge of the target machine learning system to generate a white nosie that is not random but is exploiting some specific aspects of the target model.

target Misclassification

Causing a malicious image to bypass a content filter by injecting targeted noise

Use Highly Confident Near Neighbor - combines confidence information and nearest neighbor search to reinforce the base model.

Mitigations

Adversarial inputs are not robust in attribution space. Natural inputs are robust in attribution space. Check other properties.

An attempt by an attacker to get a model to return their desired label for a given input.

This usually forces a model to return a false positive or false negative. The end result is a subtle takeover of the model’s classification accuracy, whereby an attacker can induce specific bypasses at will.

source/target Misclassification

Implement a minimum time threshold between calls to the API providing classification results.
Feature Denoising
Train with known adversarial samples to build resilience
Invest in developing monotonic classification with selection of monotonic features
Semi-definite relaxation that outputs a certificate that for a given network and test input
Issue alerts on classification results with high variance between classifiers

Mitigation

The attacker’s target classification can be anything other than the legitimate source classification

Random Misclassification

Causing a car to identify a stop sign as something else

Mitigation

Use Highly Confident Near Neighbor - combines confidence information and nearest neighbor search to reinforce the base model.

Adversarial inputs are not robust in attribution space. Natural inputs are robust in attribution space. Check other properties.

(Same as Target Misclassification)

An attacker can craft inputs to reduce the confidence level of correct classification, especially in high-consequence scenarios.
This can also take the form of a large number of false positives meant to overwhelm administrators or monitoring systems with fraudulent alerts indistinguishable from legitimate alerts.

confidence Reduction

mITIGATIONS

Use Highly Confident Near Neighbor - combines confidence information and nearest neighbor search to reinforce the base model.

Adversarial inputs are not robust in attribution space. Natural inputs are robust in attribution space. Check other properties.

(Same as Target Misclassification)

An attacker manipulates the training data used to train the model.

The goal of a poisoning attack is to introduce malicious data points into the training set that will cause the model to make incorrect predictions on new, unseen data

pOISON aTTACKS

The goal of the attacker is to contaminate the machine model generated in the training phase, so that predictions on new data will be modified in the testing phase.

Targeted pOISON aTTACKS

Submitting AV software as malware to force its misclassification as malicious and eliminate the use of targeted AV software on client systems

Define anomaly sensors to look at data distribution on day to day basis and alert on variations
Input validation, both sanitization and integrity checking
Poisoning injects outlying training samples
Reject-on-Negative-Impact (RONI) defense
Pick learning algorithms that are robust in the presence of poisoning samples (Robust Learning)

mITIGATIONS

Goal is to ruin the quality/integrity of the data set being attacked.

Many datasets are public/untrusted/uncurated, so this creates additional concerns around the ability to spot such data integrity violations in the first place.

Indiscriminate Data Poisoning

A company scrapes a well-known and trusted website for oil futures data to train their models. The data provider’s website is subsequently compromised via SQL Injection attack. The attacker can poison the dataset at will and the model being trained has no notion that the data is tainted.

mITIGATIONS

Same as in Targeted Data Poisoning

Define anomaly sensors to look at data distribution on day to day basis and alert on variations
Input validation, both sanitization and integrity checking
Poisoning injects outlying training samples
Reject-on-Negative-Impact (RONI) defense
Pick learning algorithms that are robust in the presence of poisoning samples (Robust Learning)

The private features used in machine learning models can be recovered.

This includes reconstructing private training data that the attacker does not have access to. (Hill Climbing)

mODEL iNVERSION aTTACKS

Recovering an image using only the person's name and access to the facial recognition system

Strong access control
Rate-limit queries allowed by model
Implement gates between users/callers and the actual model
Perform input validation on all proposed queries
Return only the minimum amount of information needed

mITIGATIONS

The attacker can determine whether a given data record was part of the model’s training dataset or not.

Membership iNference aTTACKS

Predict a patient’s main procedure (e.g: Surgery the patient went through) based on their attributes (e.g: age, gender, hospital)

Differential Privacy can be an effective mitigation (theory)
Usage of neuron dropout and model stacking

Mitigations

Differential privacy is a concept in data privacy that aims to provide strong privacy guarantees for individuals whose data is used in statistical analysis. The idea is to design algorithms that can answer queries about a dataset while protecting the privacy of individuals whose data is included in the dataset.

Model stacking, also known as stacked generalization, is a technique in machine learning where multiple predictive models are combined to improve the accuracy of predictions. In model stacking, the output of several individual models is used as input for a final model, which makes the final prediction.

The attackers recreate the underlying model by legitimately querying the model.

mODEL sTEALING aTTACKS

Call an API to get scores and based on some properties of a malware, craft evasions

Minimize or obfuscate the details returned
Define a well-formed query for your model inputs
Return results only in response to a completed query
Return rounded confidence values

mITIGATIONS

By means of a specially crafted query from an adversary, Machine learning systems can be reprogrammed to a task that deviates from the creator’s original intent.

rEPROGRAMMING aTTACKS

A sticker is added to an image to trigger a specific response from the network. The attacker trains a separate network to generate the patch, which can then be added to any image to fool the target network and make the car speed up instead of slowdown when detecting a crosswalk.

Strong Authentication and Authorization
Takedown of the offending accounts
Enforce SLAs for our APIs

mITIGATIONS

Owing to large resources (data + computation) required to train algorithms, the current practice is to reuse models trained by large corporations and modify them slightly for task at hand. These can be biased or trained with compromised data.

sUPPLY cHAIN aTTACKS

Minimize 3rd-party dependencies for models and data
Incorporate these dependencies into your threat modeling
Leverage strong authentication, access control and encryption

mITIGATIONS

The training process is outsourced to a malicious 3rd party who tampers with training data and delivered a trojaned model which forces targeted mis-classifications, such as classifying a certain virus as non-malicious

tHIRD-PARTY aTTACKS

Train all sensitive models in-house
Catalog training data or ensure it comes from a trusted third party
Threat model the interaction between the MLaaS provider and your own systems

mITIGATIONS

?

https://learn.microsoft.com/en-us/security/engineering/threat-modeling-aiml
https://www.cs.cmu.edu/~mfredrik/papers/fjr2015ccs.pdf
https://learn.microsoft.com/en-us/security/engineering/failure-modes-in-machine-learning
https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf

Machine learning Threat Modeling

Agenda

Threat Modeling RECAP

1

2

3

5

4

Frameworks

Machine Learning RECAP

Supervised Learning

Unsupervised Learning

Semi-Supervised Learning

Types of Learning

Performance != Speed || Memory Consumption

Reinforcement Learning

Deep Learning

Transfer Learning

Types of Models

Techniques

ML Threat Modeling

How is it trained?

How is it called?

How is it deployed?

Adversarial Perturbation

Adversarial Perturbation

Adversarial Perturbation

Classification Attacks

POison Attacks

mODEL iNVERSION Attacks

mODEL sTEALING Attacks

rEPROGRAMING Attacks

sUPPLY CHAIN Attacks

tHIRD-PARTY Attacks

membership inference Attacks

cLASSIFICATION aTTACKS

target Misclassification

Mitigations

source/target Misclassification

Mitigation

Random Misclassification

Mitigation

confidence Reduction

mITIGATIONS

pOISON aTTACKS

Targeted pOISON aTTACKS

mITIGATIONS

Indiscriminate Data Poisoning

mITIGATIONS

mODEL iNVERSION aTTACKS

mITIGATIONS

Membership iNference aTTACKS

Mitigations

mODEL sTEALING aTTACKS

mITIGATIONS

rEPROGRAMMING aTTACKS

mITIGATIONS

sUPPLY cHAIN aTTACKS

mITIGATIONS

tHIRD-PARTY aTTACKS

mITIGATIONS

?

References