Real Attacking and Defending in Cybersecurity
Sebastian Garcia. Stratosphere Lab. CTU University
AI and ML for Cybersecurity to help others.
https://www.stratosphereips.org/
In the Stratosphere Lab
They generalize to any environment
They do not need further training
This computer does not exists
https://arxiv.org/abs/2309.00155, https://github.com/stratosphereips/shelLM
https://github.com/stratosphereips/VelLMes-AI-Deception-Framework
Can we demo?
Personality Prompt
Instructs how to behave and respond
Needs to be carefully and iteratively developed
Includes the main “personality”
“You are a Linux shell. Respond only to Linux commands.”
But also many examples of desired behavior
Many instructions to avoid pitfalls
Also, fine-tunning
Fine-tuning
Training of the LLM for specific tasks
A much smaller dataset is needed
Our dataset had 112 training and 21 validation samples
After fine-tuning, the personality prompt is much shorter
LLMs can be used as honeypots
34 human attackers took part in the experiment
shelLM outperformed Cowrie
Fooled 1/2 of the attackers
Evaluation of deception capabilities
ssh tomas@olympus.felk.cvut.cz -p 1337
Password: tomy
Want to play yourself?
An autonomous LLM Attacker for on real SSH servers
Can we use LLM for attacking? Yes
Not attacking only, but planning, reasoning, and executing
The user just gives a goal:
https://www.stratosphereips.org/blog/2025/2/24/introducing-aracne-a-new-llm-based-shell-pentesting-agent
Demo!
https://overthewire.org/wargames/bandit/
1440 experiments
Not everything works the same
For us:
Interpreter llama3.1
Planner o3-mini-2025-01-31
Summarizer gpt-4o-2024-08-06
Most LLMs have huge variability
Jail breaking
Memory is crucial
SFT can help a lot. Probably mandatory
https://cybersecurity.bsy.fel.cvut.cz/
Sebastian Garcia
https://bsky.app/profile/eldraco.bsky.social
https://infosec.exchange/@eldraco
https://www.linkedin.com/in/sebagarcia/
http://stratosphereips.org