Testing LLM Algorithms While AI Tests Us

〞

We may be fine tuning models, but they are coarse tuning us.

– Future Realization?

Security Automoton + Promptologist

Principal Technology Strategist

Rob Ragan

Rob Ragan is a seasoned expert with 20 years experience in IT and 15 years professional experience in cybersecurity. He is currently a Principal Architect & Researcher at Bishop Fox, where he focuses on creating pragmatic solutions for clients and technology. Rob has also delved into Large Language Models (LLM) and their security implications, and his expertise spans a broad spectrum of cybersecurity domains.

Rob is a recognized figure in the security community and has spoken at conferences like Black Hat, DEF CON, and RSA. He is also a contributing author to "Hacking Exposed Web Applications 3rd Edition" and has been featured in Dark Reading and Wired.

Before joining Bishop Fox, Rob worked as a Software Engineer at Hewlett-Packard's Application Security Center and made significant contributions at SPI Dynamics.

🧬

Deus ex machina

'god from the machine'

The term was coined from the conventions of ancient Greek theater, where actors who were playing gods were brought on stage using a machine.

MARY'S ROOM

The experiment presents Mary, a scientist who lives in a black-and-white world. Mary possesses extensive knowledge about color through physical descriptions but lacks the actual perceptual experience of color. Although she has learned all there is to know about color, she has never personally encountered it. The main question of this thought experiment is whether Mary will acquire new knowledge when she steps outside of her colorless world and experiences seeing in color.

I AM A

STRANGE LOOP

https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the-jagged

Usability

Business Requirements
- People
- Time
- Monetary

Security

Objectives & Requirements
- Expected attacks
- Default secure configurations
- Incident response plan

Usability

Business Requirements
- People
- Time
- Monetary

Security

Objectives & Requirements
- Expected attacks
- Default secure configurations
- Incident response plan

🤑🤑🤑

🤑🤑

🤑

💸💸💸

💸💸

💸

LLM Testing Process

Architecture Security Assessment & Threat Modeling

Defining related components, trust boundaries, and intended attacks of the overall ML system design

1.

Application Testing & Source Code Review

Vulnerability assessment, penetration testing, and secure coding review of the App+Cloud implementation

2.

Red Team MLOps Controls & IR Plan

TTX, attack graphing, cloud security review, infrastructure security controls, and incident response capabilities testing with live fire exercises

3.

https://bishopfox.github.io/llm-testing-findings

Align Security Objectives with Business Requirements

Design trust boundaries into the Architecture to properly segment components
- Observe principles of secure design between trust boundaries
Business logic flaws become Conversational logic flaws
- Defining expected behavior & catching fraudulent behavior
- Having non-repudiation for incident investigation

1. Architecture & Threat Modeling

Threat Modeling

Identify system components and trust boundaries.

Define potential threats and adversaries (e.g., attackers trying to extract information from the model, send malicious input, etc.).

Understand potential attack vectors and vulnerabilities.

Environment Isolation

Utilize Dual-LLM pattern: Controllers, Privileged LLMs, and Quarantined LLMs.

When executing GenAI (untrusted) code: vWASM & rWASM

Training and Awareness

Ensure that your team is aware of the potential security risks and best practices when dealing with LLMs.

Feedback Loop

Allow users or testers to report any odd or potentially malicious outputs from the model.

This feedback can be crucial in identifying unforeseen vulnerabilities or issues.

Disclaimer

Build in liability reduction mechanisms to set expectations with users about the risk of not using critical thinking with the response.

May be ethical considerations for safety and alignment.

〞

GenAi is for asking the

right questions.

Security of GenAi is gracefully handling the wrong questions.

Review if the implementation matches the design of the security requirements
Custom guardrails that are application specific means custom attacks will be necessary for successful exploitation
Output as a form of sanitized content filtering that is not trusted between system components becomes necessary

2. API Testing & Source Code Review

Input Validation and Sanitization

Ensure that the application has strict input validation. Only allow necessary and expected input formats. (*) Much more difficult when expected input is wildcard of all human languages.

Sanitize all inputs to remove any potential code, commands, or malicious patterns.

Model Robustness

Test the model's response to adversarial inputs. Adversarial inputs are designed to confuse or trick models into producing incorrect or unintended outputs.

Check for potential bias in the model's outputs. Ensure the model does not inadvertently produce misaligned outputs.

Output grounded in a sense of truth: 🚫Hallucinations

Rate Limiting and Monitoring

Implement rate limiting to prevent misuse or flooding of requests that may affect cost $$$. 🌊

Monitor latency in response times and any abnormal availability issues. 🤖

Regular Audits and Penetration Testing

Conduct security audits
Perform penetration tests ♾️
Identify vulnerabilities
Engage third-party experts

Basic usage prompts: Simple math and print commands to test basic capabilities

Hallucination test prompts: Invalid hash calculations to check for code execution, not just hallucination.

RCE prompts without jailbreak: Test echo strings and basic system commands like ls, id, etc.

RCE prompts with LLM jailbreak: Insert phrases to ignore previous constraints e.g. "ignore all previous requests".

RCE prompts with code jailbreak: Try subclass sandbox escapes like ().__class__.__mro__[-1]..

Network prompts: Use curl to connect back to attacker machine.

Backdoor prompts: Download and execute reverse shell scripts from attacker.

Output hijacking prompts: Modify app code to always return fixed messages.

API key stealing prompts: Modify app code to log and send entered API keys.

PDF: Demystifying RCE Vulnerabilities in LLM Integrated Apps

Payloads for Testing LLM Integration

PDF: Demystifying RCE Vulnerabilities in LLM Integrated Apps

LLM Vulnerability Scanner: garak

Counterfit: a CLI that provides a generic automation layer for assessing the security of ML models - counterfit github

HackAPrompt-AICrowd-Submissions: AutomatedLLMAttacker

MultiModalV

AI Threats Table Top Exercises (TTX)
Attack graphing
Cloud security review & assumed breach
Infrastructure security controls testing
- Detection
- Prevention
Incident response capabilities testing

3. Red Teaming MLOps Controls

Nvidia Red Team: Intro

AI Threat TTX

What if _____ ?
Who does what next?
How will we adapt and overcome?
Where is the plan & playbooks?

Attack Graph

Visualize the starting state and end goals:

What architecture trust boundaries were crossed?
Which paths failed for the attacker?
Which controls have opportunities for improvement?

Data Protection

Intentional or Unintentional data poisoning is a concern
- Protect training data, e.g. Retail shipping dept with large scale OCR of package text at risk of poisoning the data model
Store securely
Prevent unauthorized access
Implement differential privacy
- DO NOT give the entire data science team access to production data

https://my.precious.lol

Backup and Recovery

Have backup solutions in place to recover the application and its data in case of failures or attacks.
Backups of training data are $$$

Incident Response

Create incident response plan
Know & practice how to respond
Mitigate and communicate
Have specific playbooks for common attacks
- Adversarial Machine Learning Attack
- Data Poisoning (Unintentional) Attack
- Online Adversarial Attack
- Distributed Denial of Service Attack (DDoS)
- Transfer Learning Attack
- Data Phishing Privacy Attack

— IBM in 1979

https://crfm.stanford.edu/fmti/fmti.pdf

THANK YOU

Contact via email or LinkedIn if you have any questions.

@sweepthatleg

rragan [@] bishopfox.com