Web scraping

by: sjdonado

What's it?

Scraping is the process of using bots to extract content and data from a website.

Source: https://www.incapsula.com/web-application-security/web-scraping-attack.html

Scraper tools and bots

  • Recognize unique HTML site structures
  • Extract and transform content
  • Store scraped data
  • Extract data from APIs

Scraper libraries

Scraper types

  • Googlebot identifies itself in its HTTP header as belonging to Google(robot.txt)
  • Malicious bots, create a false HTTP user agent

Web scraping protection

Server-side scraping

  • Cheerio.js

  • Phantomjs

Web scraping

By Juan Rodriguez

Web scraping

  • 1,030