Cyber Threat Intelligence Investigation
&
Cloud based Web Scraping
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9274133/logo.png)
Hippolyte Quéré (Hippie)
Hippolyte Quéré (Hippie)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9274140/profile.gif)
1.OSINT
2.Online storage website
3.Web scraping
1. OSINT
OSINT ?
![](https://media2.giphy.com/media/xT9C25UNTwfZuk85WP/giphy.gif)
Open
Source
INTelligence
2. Online storage website
Online storage website
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9274163/logo.png)
Me
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9274173/logo_rhackgondins_2000.png)
My friend
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9274185/folder.png)
My needs
- Send huge files
- No registration
- I care about my personnal datas
- Free
anonfiles.com
- Free
- Anonymous
- 20GB file upload
- Unlimited bandwidth !!!!!!
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9274194/pasted-from-clipboard.png)
![](https://media3.giphy.com/media/xT0xeJpnrWC4XWblEk/giphy.gif)
Analyse the website
3. Scraping
Web scraping
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9307429/pngaaa.com-5401544.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9307430/pasted-from-clipboard.png)
❤️ podalirius.net
Legal disclaimer
- Legal as long as it is public information without personal datas
- No resale possible: Because copyright infringement of original data.
- In reality it is possible, but complicated
- Many special cases
Ethical web scraping
- APIs are often the best solution
- Respect the Robots.txt files
- Read the Terms and Conditions
- Identify yourself with a user-agent
- Respect the data
3. Scraping
Let's scrap !
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9307968/Screenshot_2022-02-02_at_16.04.05.png)
10 characters long containing only upper and lower case letters and numbers
![](https://media0.giphy.com/media/d3mlE7uhX8KFgEmY/giphy.gif)
How long will it take ?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9308039/pasted-from-clipboard.png)
How long will it take ?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9308064/pasted-from-clipboard.png)
How long will it take ?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9308064/pasted-from-clipboard.png)
Quick math time
= 62^10(size)
= 839,299,365,868,340,224 (8,39.10^17)
100 000 000 -> 22.10 secondes
= 184,645,860,491.03484928 secondes
= 2137104 days 20 h 48m 11s
= 5 698years
3. Scraping
How does my scraper works ?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9308189/Screenshot_2022-02-02_at_17.19.23.png)
3. Scraping
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9308189/Screenshot_2022-02-02_at_17.19.23.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9308184/pasted-from-clipboard.png)
Sum up of what you have to avoid
- Bypass the request rate based on UA, language, country, keywords ...
- IP banning
- Captchas
- Error handling
- changes in the detection system
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9308173/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9308152/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9280482/pasted-from-clipboard.png)
Data results
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9308542/Screenshot_2022-02-02_at_20.19.14.png)
Is Google lying ?
I was stuck on a result : lumendatabase.org/XXXX
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9308561/Screenshot_2022-02-02_at_20.30.15.png)
Future improvements
- Creating a job list with multiple pre-generated dorks
- handle multiple slaves
- Data visualization of my results
Conclusion
Conclusion
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/9308580/pasted-from-clipboard.png)
The end
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1917718/images/8738400/Plan_de_travail_1.png)
Rhackgondins ❤
![](https://media4.giphy.com/media/l0HlT86IOp6nE9he0/giphy.gif)
Cyber Threat Intelligence Investigation & Cloud based Web Scraping
By hippie
Cyber Threat Intelligence Investigation & Cloud based Web Scraping
- 250