Cyber Threat Intelligence Investigation

&

Cloud based Web Scraping

Hippolyte Quéré (Hippie)

Hippolyte Quéré (Hippie)

Cybersecurity student at ESAIP

CTF Player @Rhackgondins

https://hippie.cat

@hiippiiie

1.OSINT
2.Online storage website
3.Web scraping

1. OSINT

OSINT ?

Open

Source

INTelligence

2. Online storage website

Online storage website

Me

My friend

My needs

  • Send huge files
  • No registration
  • I care about my personnal datas
  • Free

anonfiles.com

- Free

- Anonymous

- 20GB file upload

- Unlimited bandwidth !!!!!!

Analyse the website

3. Scraping

Web scraping

❤️ podalirius.net

Legal disclaimer

- Legal as long as it is public information without personal datas
- No resale possible: Because copyright infringement of original data.
- In reality it is possible, but complicated
- Many special cases

Ethical web scraping

- APIs are often the best solution
- Respect the Robots.txt files
- Read the Terms and Conditions
- Identify yourself with a user-agent
- Respect the data

3. Scraping

Let's scrap !

10 characters long containing only upper and lower case letters and numbers

How long will it take ?

How long will it take ?

How long will it take ?

Quick math time

= 62^10(size)
= 839,299,365,868,340,224 (8,39.10^17)

100 000 000 -> 22.10 secondes
= 184,645,860,491.03484928 secondes
= 2137104 days 20 h 48m 11s 
= 5 698years

3. Scraping

How does my scraper works ?

3. Scraping

Sum up of what you have to avoid

  1. Bypass the request rate based on UA, language, country, keywords ...
  2. IP banning
  3. Captchas
  4. Error handling
  5. changes in the detection system

Data results

Is Google lying ?

I was stuck on a result : lumendatabase.org/XXXX

Future improvements

- Creating a job list with multiple pre-generated dorks

 

- handle multiple slaves

 

- Data visualization of my results

Conclusion

Conclusion

The end

Rhackgondins ❤

Cyber Threat Intelligence Investigation & Cloud based Web Scraping

By hippie

Cyber Threat Intelligence Investigation & Cloud based Web Scraping

  • 316