Bots, Bots Everywhere
Work in Progress
Iliana FAYOLLE*, Sihem BOUHENNICHE*,
Samuel PELISSIER, Clémentine MAURICE, Walter RUDAMETKIN
GDR Sécurité - Journées SSLR
9 October 2025

* These authors contributed equally to this work.

FACT
More and more of the global internet traffic
is filled with bots (especially bad ones). [1, 2]

BAD BOT IMPERVA REPORT [1]
FACT






[3]
[4]
[5]
[6]
[7]
[8]
robots.txt [8, 9]
Hi! Sure!
Here it is:
Hi!
Send me your robots.txt
please. :)
🤖
Nice_bot
💻
Web server
📃
robots.txt
User-Agent: Nice_bot
User-Agent: Mean_bot
Disallow: /
Ok! Bye!
Hi! Sure!
Here it is:
Send me your robots.txt
💻
Web server
📃
robots.txt
User-Agent: Nice_bot
User-Agent: Mean_bot
Disallow: /
Whatever.
🤖
Mean_bot
robots.txt [8, 9]
💻
Web server
...
🤖
robots.txt [8, 9]
💻
Web server
IPs & User-Agents blocking [10]
Your ID please.
🤖
Sure.

IP: 20.42.10.176
UA: GPTBOT
*************
*****
No robots allowed.
🚪
⛔
💻
Web server
Your ID please.
Sure.

🤖
Disguised_bot

IP: 20.42.10.176
UA: Mozilla/5.0
(iPhone14,2; *******)
*****
Welcome!
🚪
✅
IPs & User-Agents blocking [10]
Challenges solutions: CAPTCHAs [11, 12, 13] & Proof of Work [14]
💻
Web server
🤖
🚪
Please solve this.
💧



💧
💧
Fingerprinting strategies: TLS [17] & Browser [7, 15, 16]
💻
Web server

🤖
Disguised_bot

IP: 20.42.10.176
UA: Mozilla/5.0
(iPhone14,2; *******)
*****

Your ID please.
Screen width: 1920;
Screen height: 1080;
Timezone: UTC+02:00;
|
|
|
🚪
Hmm...
⛔
Proprietary or commercial solutions: Cloudflare [8, 18]
💻
Web server
🤖
🚪

I'll check you
out first.
💧

🔍 RQ1: What is the best technique for bot detection?
⚡ RQ2: Does combining techniques improve detection
(especially against AI bots)?
🤖 RQ3: Do AI-driven bots have unique fingerprints
compared to traditional bots?
RESEARCH QUESTIONS
METHODOLOGY
💻
Web server
💻
Web server
Visited by Humans & Bots
3,000 daily visits
No defenses against bots
Get browser, HTTP & TLS
Invisibly linked to 🍯
Visited by Bots only
Get browser, HTTP & TLS
Act as a reverse proxy toward 9 servers with != defenses against bots
TLS
HTTP HEADERS
JS
Time
Ciphersuites, Extensions ...
IPs, User-Agents ...
OS, Window size ...
METHODOLOGY
💻
Web server
💻
💻
💻
💻
💻
💻
💻
💻
💻
robots.txt
nginx rules
reCAPTCHA v3

ProCAPTCHA


Cloudflare Turnstile
Anubis

Mix
Mix
Cloudflare anti-bot mechanisms

IP: 20.42.10.176
UA: GPTBOT
*************
*****

📃
robots.txt
METHODOLOGY

OUR FUTURE CONTRIBUTIONS
📊 C1: Build a large anonymized dataset of human and bot fingerprints (HTTP, TLS, browser) to reveal key connection patterns.
🧩 C2: Define criteria to distinguish humans, traditional bots, and LLM-driven bots across multiple fingerprinting levels.
⚙️ C3: Evaluate the performance of each detection method and the benefits of combining them.
RESULTS SO FAR
TLS
HTTP HEADERS
JS
Time
Ciphersuites, Extensions ...
IPs, User-Agents ...
OS, Window size ...
🔎
Identify Bots
→ No evidence of bots impersonating other bots so far
→ Some IPs use != UAs each time
→ Many connections without UA or cookies
- Unique IPs: 5667 — Unique UAs: 4153
- Total connections (IP + UA + cookie + local time): 32371
→ Only 12% of unique IDs self-identify as bots
RESULTS SO FAR
TLS
HTTP HEADERS
JS
Time
Ciphersuites, Extensions ...
IPs, User-Agents ...
OS, Window size ...
🛡️
Evaluate Defenses
→ Some bots seems to evade robots.txt and rules by switching user-agents.
→ No bots actually test all defense layers
→ No bots have bypassed certain defenses
LIMITS
❌ No AI challenge testing
📉 Limited bot connections
🔍 No ground truth for human vs bot
⏱️ Some bots leave before fingerprinting
or exit challenge pages early
🤏 Few connections from main server
🛡️ No bots test all defenses
CONCLUSION & FUTURE WORKS
TLS
HTTP HEADERS
JS
Time
Ciphersuites, Extensions ...
IPs, User-Agents ...
OS, Window size ...
Continue passive
data collection
Improve bot recognition
Track behavior on the site over time
Active testing
of AI bots
[1] BAD BOT IMPERVA REPORT. https://www.imperva.com/resources/wp-content/uploads/sites/6/reports/2025-Bad-Bot-Report.pdf. Accessed: 2025-06-10.
[2] CLOUDFLARE RADAR BOTS. https://radar.cloudflare.com/bots. Accessed: 2025-06-10
[3] https://forum.chatons.org/t/mise-a-mal-des-forges-git-par-les-collecteurs-dia/7086. Accessed: 2025-08-10.
[4] https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html. Accessed: 2025-08-10.
[5] Declare your AIndependence: block AI bots, scrapers and crawlers with a single click. https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/. Accessed: 2025-08-10.
[6] LI, X., AZAD, B. A., RAHMATI, A., AND NIKI-FORAKIS, N. Good bot, bad bot: Characterizing auto-mated browsing activity. In SP (2021), IEEE, pp. 1589–1605.
REFERENCES 1/3
[7] AZAD, B. A., STAROV, O., LAPERDRIX, P., AND NIKIFORAKIS, N. Web runner 2049: Evaluating third-party anti-bot services. In DIMVA (2020), vol. 12223 of Lecture Notes in Computer Science, Springer, pp. 135–159
[8] LIU, E., LUO, E., SHAN, S., VOELKER, G. M., ZHAO, B. Y., AND SAVAGE, S. Somesite I used to crawl: Awareness, agency and efficacy in protecting content creators from AI crawlers. CoRR abs/2411.15091 (2024)
[9] Robots.txt. Introduction au protocole d'exclusion des robots. https://robots-txt.com/. Accessed: 2025-08-10.
[10] How to Block Bad Bots and Spiders using .htaccess https://chemicloud.com/kb/article/block-bad-bots-and-spiders-using-htaccess/. Accessed: 2025-08-10.
[11] reCAPTCHA v3 documentation. https://developers.google.com/recaptcha/docs/v3. Accessed: 2025-08-10.
[12] Prosopo CAPTCHA. https://prosopo.io/. Accessed: 2025-08-10.
REFERENCES 2/3
[13] Cloudflare Turnstile. https://www.cloudflare.com/application-services/products/turnstile/. Accessed: 2025-08-10.
[14] Anubis. https://anubis.techaro.lol/. Accessed: 2025-08-10.
[15] VASTEL, A., RUDAMETKIN, W., ROUVOY, R., AND BLANC, X. FP-Crawlers: Studying the Resilience of Browser Fingerprinting to Block Crawlers. In MADWeb’20 - NDSS Workshop on Measurements, Attacks, and Defenses for the Web (San Diego, United States, Feb. 2020), O. Starov, A. Kapravelos, and N. Nikiforakis, Eds.
[16] VENUGOPALAN, H., MUNIR, S., AHMED, S., WANG, T., KING, S. T., AND SHAFIQ, Z. Fp-inconsistent: Detecting evasive bots using browser fingerprint inconsistencies. CoRR abs/2406.07647 (2024).
[17] PAPADOGIANNAKI, E., AND IOANNIDIS, S. Pump up the JARM: studying the evolution of botnets using active TLS fingerprinting. In ISCC (2023), IEEE, pp. 764–770.
[18] Bot Fight Mode documentation. https://developers.cloudflare.com/bots/get-started/bot-fight-mode/. Accessed: 2025-08-10.
REFERENCES 3/3
Copy of Work_in_Progress_Bots_Bots_Everywhere-Journée_SSLR_08-10-2025
By Iliana Fayolle
Copy of Work_in_Progress_Bots_Bots_Everywhere-Journée_SSLR_08-10-2025
- 25