Bots, Bots Everywhere

Work in Progress

Iliana FAYOLLE*, Sihem BOUHENNICHE*,

Samuel PELISSIER, Clémentine MAURICE, Walter RUDAMETKIN

GDR Sécurité - Journées SSLR

9 October 2025

* These authors contributed equally to this work.

FACT

More and more of the global internet traffic

is filled with bots (especially bad ones). [1, 2]

BAD BOT IMPERVA REPORT [1]

FACT

[3]

[4]

[5]

[6]

[7]

[8]

robots.txt [8, 9]

Hi! Sure!

Here it is:

Hi!

Send me your robots.txt

please. :) 

🤖

Nice_bot

💻

Web server

📃

robots.txt

User-Agent: Nice_bot
User-Agent: Mean_bot
Disallow: /

Ok! Bye!

Hi! Sure!

Here it is:

 

Send me your robots.txt

💻

Web server

📃

robots.txt

User-Agent: Nice_bot
User-Agent: Mean_bot
Disallow: /

Whatever.

🤖

Mean_bot

robots.txt [8, 9]

💻

Web server

...

🤖

robots.txt [8, 9]

💻

Web server

IPs & User-Agents blocking [10]

Your ID please.

🤖

Sure.

IP: 20.42.10.176

UA: GPTBOT

*************

*****

No robots allowed.

🚪

💻

Web server

Your ID please.

Sure.

🤖

Disguised_bot

IP: 20.42.10.176

UA: Mozilla/5.0

 (iPhone14,2; *******)

*****

Welcome!

🚪

IPs & User-Agents blocking [10]

Challenges solutions: CAPTCHAs [11, 12, 13] & Proof of Work [14]

💻

Web server

🤖

🚪

Please solve this.

💧

💧

💧

Fingerprinting strategies: TLS [17] & Browser [7, 15, 16]

💻

Web server

🤖

Disguised_bot

IP: 20.42.10.176

UA: Mozilla/5.0

 (iPhone14,2; *******)

*****

Your ID please.

Screen width: 1920;
Screen height: 1080;
Timezone: UTC+02:00;
 
 

🚪

Hmm...

Proprietary or commercial solutions: Cloudflare [8, 18]

💻

Web server

🤖

🚪

I'll check you

out first.

💧

🔍 RQ1: What is the best technique for bot detection?

RQ2: Does combining techniques improve detection

(especially against AI bots)?

🤖 RQ3: Do AI-driven bots have unique fingerprints

compared to traditional bots?

RESEARCH QUESTIONS

METHODOLOGY

💻

Web server

💻

Web server

Visited by Humans & Bots

3,000 daily visits

No defenses against bots

Get browser, HTTP & TLS 🫆

Invisibly linked to 🍯

Visited by Bots only

Get browser, HTTP & TLS 🫆

Act as a reverse proxy toward 9 servers with != defenses against bots

TLS

HTTP HEADERS

JS

Time

Ciphersuites, Extensions ...

IPs, User-Agents ...

OS, Window size ...

METHODOLOGY

💻

Web server

💻

💻

💻

💻

💻

💻

💻

💻

💻

robots.txt

nginx rules

reCAPTCHA v3

ProCAPTCHA

Cloudflare   Turnstile

Anubis

Mix

Mix

Cloudflare anti-bot mechanisms

IP: 20.42.10.176

UA: GPTBOT

*************

*****

📃

robots.txt

METHODOLOGY

OUR FUTURE CONTRIBUTIONS

📊 C1: Build a large anonymized dataset of human and bot fingerprints (HTTP, TLS, browser) to reveal key connection patterns.

🧩 C2: Define criteria to distinguish humans, traditional bots, and LLM-driven bots across multiple fingerprinting levels.

⚙️ C3: Evaluate the performance of each detection method and the benefits of combining them.

RESULTS SO FAR

TLS

HTTP HEADERS

JS

Time

Ciphersuites, Extensions ...

IPs, User-Agents ...

OS, Window size ...

🔎

Identify Bots

No evidence of bots impersonating other bots so far

→ Some IPs use != UAs each time

→ Many connections without UA or cookies

- Unique IPs: 5667 — Unique UAs: 4153

- Total connections (IP + UA + cookie + local time): 32371

→ Only 12% of unique IDs self-identify as bots

RESULTS SO FAR

TLS

HTTP HEADERS

JS

Time

Ciphersuites, Extensions ...

IPs, User-Agents ...

OS, Window size ...

🛡️

Evaluate Defenses

Some bots seems to evade robots.txt and rules by switching user-agents.

No bots actually test all defense layers

 No bots have bypassed certain defenses

LIMITS

❌ No AI challenge testing

📉 Limited bot connections

🔍 No ground truth for human vs bot

⏱️ Some bots leave before fingerprinting

or exit challenge pages early

🤏 Few connections from main server

🛡️ No bots test all defenses

CONCLUSION & FUTURE WORKS

TLS

HTTP HEADERS

JS

Time

Ciphersuites, Extensions ...

IPs, User-Agents ...

OS, Window size ...

Continue passive 

data collection

Improve bot recognition

Track behavior on the site over time

Active testing 

of AI bots

[1] BAD BOT IMPERVA REPORT. https://www.imperva.com/resources/wp-content/uploads/sites/6/reports/2025-Bad-Bot-Report.pdf. Accessed: 2025-06-10.

[2] CLOUDFLARE RADAR BOTS. https://radar.cloudflare.com/bots. Accessed: 2025-06-10

[3] https://forum.chatons.org/t/mise-a-mal-des-forges-git-par-les-collecteurs-dia/7086. Accessed: 2025-08-10.

[4] https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html. Accessed: 2025-08-10.

[5] Declare your AIndependence: block AI bots, scrapers and crawlers with a single click. https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/. Accessed: 2025-08-10.

[6] LI, X., AZAD, B. A., RAHMATI, A., AND NIKI-FORAKIS, N. Good bot, bad bot: Characterizing auto-mated browsing activity. In SP (2021), IEEE, pp. 1589–1605.

REFERENCES 1/3

[7] AZAD, B. A., STAROV, O., LAPERDRIX, P., AND NIKIFORAKIS, N. Web runner 2049: Evaluating third-party anti-bot services. In DIMVA (2020), vol. 12223 of Lecture Notes in Computer Science, Springer, pp. 135–159

[8] LIU, E., LUO, E., SHAN, S., VOELKER, G. M., ZHAO, B. Y., AND SAVAGE, S. Somesite I used to crawl: Awareness, agency and efficacy in protecting content creators from AI crawlers. CoRR abs/2411.15091 (2024)

[9] Robots.txt. Introduction au protocole d'exclusion des robots. https://robots-txt.com/. Accessed: 2025-08-10.

[10] How to Block Bad Bots and Spiders using .htaccess https://chemicloud.com/kb/article/block-bad-bots-and-spiders-using-htaccess/. Accessed: 2025-08-10.

[11] reCAPTCHA v3 documentation. https://developers.google.com/recaptcha/docs/v3. Accessed: 2025-08-10.

[12] Prosopo CAPTCHA. https://prosopo.io/. Accessed: 2025-08-10.

REFERENCES 2/3

[13] Cloudflare Turnstile. https://www.cloudflare.com/application-services/products/turnstile/. Accessed: 2025-08-10.

[14] Anubis. https://anubis.techaro.lol/. Accessed: 2025-08-10.

[15] VASTEL, A., RUDAMETKIN, W., ROUVOY, R., AND BLANC, X. FP-Crawlers: Studying the Resilience of Browser Fingerprinting to Block Crawlers. In MADWeb’20 - NDSS Workshop on Measurements, Attacks, and Defenses for the Web (San Diego, United States, Feb. 2020), O. Starov, A. Kapravelos, and N. Nikiforakis, Eds.
[16] VENUGOPALAN, H., MUNIR, S., AHMED, S., WANG, T., KING, S. T., AND SHAFIQ, Z. Fp-inconsistent: Detecting evasive bots using browser fingerprint inconsistencies. CoRR abs/2406.07647 (2024).

[17] PAPADOGIANNAKI, E., AND IOANNIDIS, S. Pump up the JARM: studying the evolution of botnets using active TLS fingerprinting. In ISCC (2023), IEEE, pp. 764–770.

[18] Bot Fight Mode documentation. https://developers.cloudflare.com/bots/get-started/bot-fight-mode/. Accessed: 2025-08-10.

REFERENCES 3/3

Copy of Work_in_Progress_Bots_Bots_Everywhere-Journée_SSLR_08-10-2025

By Iliana Fayolle

Copy of Work_in_Progress_Bots_Bots_Everywhere-Journée_SSLR_08-10-2025

  • 25