Cyber Threat Intelligence Investigation
&
Cloud based Web Scraping
Hippolyte Quéré (Hippie)
Hippolyte Quéré (Hippie)
1.OSINT
2.Online storage website
3.Web scraping
1. OSINT
OSINT ?
Open
Source
INTelligence
2. Online storage website
Online storage website
Me
My friend
My needs
- Send huge files
- No registration
- I care about my personnal datas
- Free
anonfiles.com
- Free
- Anonymous
- 20GB file upload
- Unlimited bandwidth !!!!!!
Analyse the website
3. Scraping
Web scraping
❤️ podalirius.net
Legal disclaimer
- Legal as long as it is public information without personal datas
- No resale possible: Because copyright infringement of original data.
- In reality it is possible, but complicated
- Many special cases
Ethical web scraping
- APIs are often the best solution
- Respect the Robots.txt files
- Read the Terms and Conditions
- Identify yourself with a user-agent
- Respect the data
3. Scraping
Let's scrap !
10 characters long containing only upper and lower case letters and numbers
How long will it take ?
How long will it take ?
How long will it take ?
Quick math time
= 62^10(size)
= 839,299,365,868,340,224 (8,39.10^17)
100 000 000 -> 22.10 secondes
= 184,645,860,491.03484928 secondes
= 2137104 days 20 h 48m 11s
= 5 698years
3. Scraping
How does my scraper works ?
3. Scraping
Sum up of what you have to avoid
- Bypass the request rate based on UA, language, country, keywords ...
- IP banning
- Captchas
- Error handling
- changes in the detection system
Data results
Is Google lying ?
I was stuck on a result : lumendatabase.org/XXXX
Future improvements
- Creating a job list with multiple pre-generated dorks
- handle multiple slaves
- Data visualization of my results
Conclusion
Conclusion
The end
Rhackgondins ❤
Cyber Threat Intelligence Investigation & Cloud based Web Scraping
By hippie
Cyber Threat Intelligence Investigation & Cloud based Web Scraping
- 316