https://github.com/samuelklam/web-scraping
Cheerio + Request
Phantom + Casper
- all the steps are simplified as the functionality is already made for us in different modules by other developers
Extract:
Express, Chalk
We can use Request - a simple way to make HTTP calls and extract the HTML of the web page
Cheerio - Cheerio takes raw HTML, parses it, and returns a jQuery object , allowing you to traverse the DOM
- difficult to handle pages with heavy ajax
- difficult to handle pagination
- works well with static pages
>> brew install phantomjs
>> brew install casperjs
Grab link results from Google Search for 'javascript' and 'python'
GitHub
github.com/samuelklam.com/web-scraping
Request
- https://github.com/request/request
Cheerio
- https://github.com/cheeriojs/cheerio
Phantom
- https://phantomjs.org
Casper
- https://casperjs.org
Import.io
https://www.import.io/
Kimono
- acquired by Palantir
- https://www.kimonolabs.com/
ParseHub
- https://parsehub.com/