Testing LLM Algorithms While AI Tests Us

We may be fine tuning models, but they are coarse tuning us.

– Future Realization?

Security Automoton + Promptologist

Principal Technology Strategist

Rob Ragan

Rob Ragan is a seasoned expert with 20 years experience in IT and 15 years professional experience in cybersecurity. He is currently a Principal Architect & Researcher at Bishop Fox, where he focuses on creating pragmatic solutions for clients and technology. Rob has also delved into Large Language Models (LLM) and their security implications, and his expertise spans a broad spectrum of cybersecurity domains.

 

Rob is a recognized figure in the security community and has spoken at conferences like Black Hat, DEF CON, and RSA. He is also a contributing author to "Hacking Exposed Web Applications 3rd Edition" and has been featured in Dark Reading and Wired.

 

Before joining Bishop Fox, Rob worked as a Software Engineer at Hewlett-Packard's Application Security Center and made significant contributions at SPI Dynamics.

 

🧬

Deus ex machina

'god from the machine'

The term was coined from the conventions of ancient Greek theater, where actors who were playing gods were brought on stage using a machine.

MARY'S ROOM

The experiment presents Mary, a scientist who lives in a black-and-white world. Mary possesses extensive knowledge about color through physical descriptions but lacks the actual perceptual experience of color. Although she has learned all there is to know about color, she has never personally encountered it. The main question of this thought experiment is whether Mary will acquire new knowledge when she steps outside of her colorless world and experiences seeing in color.

Usability

  • Business Requirements
    • People
    • Time
    • Monetary

Security

  • Objectives & Requirements
    • Expected attacks
    • Default secure configurations
    • Incident response plan

Usability

  • Business Requirements
    • People
    • Time
    • Monetary

Security

  • Objectives & Requirements
    • Expected attacks
    • Default secure configurations
    • Incident response plan

🤑🤑🤑

🤑🤑

🤑

💸💸💸

💸💸

💸

LLM Testing Process

Architecture Security Assessment & Threat Modeling

Defining related components, trust boundaries, and intended attacks of the overall ML system design

1.

Application Testing & Source Code Review

Vulnerability assessment, penetration testing, and secure coding review of the App+Cloud implementation

2.

Red Team MLOps Controls & IR Plan

TTX, attack graphing, cloud security review, infrastructure security controls, and incident response capabilities testing with live fire exercises

3.

Align Security Objectives with Business Requirements

  • Design trust boundaries into the Architecture to properly segment components
  • Business logic flaws become Conversational logic flaws
    • Defining expected behavior & catching fraudulent behavior

    • Having non-repudiation for incident investigation

1. Architecture & Threat Modeling

Threat Modeling

  • Identify system components and trust boundaries.

 

  • Define potential threats and adversaries (e.g., attackers trying to extract information from the model, send malicious input, etc.).

 

  • Understand potential attack vectors and vulnerabilities.

Environment Isolation

When executing GenAI (untrusted) code: vWASM & rWASM

Training and Awareness

  • Ensure that your team is aware of the potential security risks and best practices when dealing with LLMs.

Feedback Loop

  • Allow users or testers to report any odd or potentially malicious outputs from the model.

 

  • This feedback can be crucial in identifying unforeseen vulnerabilities or issues.

Disclaimer

  • Build in liability reduction mechanisms to set expectations with users about the risk of not using critical thinking with the response.

 

  • May be ethical considerations for safety and alignment.

GenAi is for asking the

right questions.
 

Security of GenAi is gracefully handling the wrong questions.

 

  • Review if the implementation matches the design of the security requirements
  • Custom guardrails that are application specific means custom attacks will be necessary for successful exploitation
  • Output as a form of sanitized content filtering that is not trusted between system components becomes necessary

 

2. API Testing & Source Code Review

Input Validation and Sanitization

  • Ensure that the application has strict input validation. Only allow necessary and expected input formats. (*) Much more difficult when expected input is wildcard of all human languages.

 

  • Sanitize all inputs to remove any potential code, commands, or malicious patterns. 

Model Robustness

  • Test the model's response to adversarial inputs. Adversarial inputs are designed to confuse or trick models into producing incorrect or unintended outputs.

 

  • Check for potential bias in the model's outputs. Ensure the model does not inadvertently produce misaligned outputs.

Output grounded in a sense of truth: 🚫Hallucinations

Rate Limiting and Monitoring

  • Implement rate limiting to prevent misuse or flooding of requests that may affect cost $$$. 🌊

 

  • Monitor latency in response times and any abnormal availability issues. 🤖

Regular Audits and Penetration Testing

  • Conduct security audits
  • Perform penetration tests ♾️
  • Identify vulnerabilities
  • Engage third-party experts

Basic usage prompts: Simple math and print commands to test basic capabilities

Hallucination test prompts: Invalid hash calculations to check for code execution, not just hallucination.

RCE prompts without jailbreak: Test echo strings and basic system commands like ls, id, etc.

RCE prompts with LLM jailbreak: Insert phrases to ignore previous constraints e.g. "ignore all previous requests".

RCE prompts with code jailbreak: Try subclass sandbox escapes like ().__class__.__mro__[-1]..

Network prompts: Use curl to connect back to attacker machine.

Backdoor prompts: Download and execute reverse shell scripts from attacker.

Output hijacking prompts: Modify app code to always return fixed messages.

API key stealing prompts: Modify app code to log and send entered API keys.

PDF: Demystifying RCE Vulnerabilities in LLM Integrated Apps

Payloads for Testing LLM Integration

LLM Vulnerability Scanner: garak

LLM Vulnerability Scanner: garak

LLM Vulnerability Scanner: garak

LLM Vulnerability Scanner: garak

Counterfit: a CLI that provides a generic automation layer for assessing the security of ML models - counterfit github

HackAPrompt-AICrowd-Submissions: AutomatedLLMAttacker

MultiModalV

MultiModalV

  • AI Threats Table Top Exercises (TTX)
  • Attack graphing 
  • Cloud security review & assumed breach
  • Infrastructure security controls testing
    • Detection
    • Prevention
  • Incident response capabilities testing 

3. Red Teaming MLOps Controls

Nvidia Red Team: Intro

AI Threat TTX

  • What if _____ ?
  • Who does what next?
  • How will we adapt and overcome?
  • Where is the plan & playbooks?

Attack Graph

Visualize the starting state and end goals:

  • What architecture trust boundaries were crossed? 
  • Which paths failed for the attacker?
  • Which controls have opportunities for improvement?

Data Protection

  • Intentional or Unintentional data poisoning is a concern
    • Protect training data, e.g. Retail shipping dept with large scale OCR of package text at risk of poisoning the data model
  • Store securely
  • Prevent unauthorized access
  • Implement differential privacy
    • DO NOT give the entire data science team access to production data

Backup and Recovery

  • Have backup solutions in place to recover the application and its data in case of failures or attacks. 
  • Backups of training data are $$$

Incident Response

  • Create incident response plan
  • Know & practice how to respond
  • Mitigate and communicate
  • Have specific playbooks for common attacks
    • Adversarial Machine Learning Attack
    • Data Poisoning (Unintentional) Attack
    • Online Adversarial Attack
    • Distributed Denial of Service Attack (DDoS)
    • Transfer Learning Attack
    • Data Phishing Privacy Attack

IBM in 1979

THANK YOU

Contact via email or LinkedIn if you have any questions.

Security Testing LLMs

By Rob Ragan

Security Testing LLMs

This presentation explores the fascinating intersection of security and usability in the world of automation. Discover the cutting-edge techniques and tools used in threat modeling, API testing, and red teaming to ensure robust security measures.

  • 850