KIMONO

SCRAPING THE WEB

SCRAPING PRIMER

WEB CRAWLING

WEB SCRAPING

The process of processing a web document and extracting information out of it.

the process of iteratively finding and fetching web links starting from a list of seed URL's

ETHICS OF SCRAPING

There is absolutely no technical difference between an automated computer viewing a website and a human-driven computer viewing a website."

 

ETHICAL CHALLENGES

  1. Affecting the experience of others by hitting the server too hard
     
  2. Certain uses of data may be copyright violations
     
  3. Breaking ToS is not illegal, but it may be considered a breach of contract

CREATE A SCRAPER

STEP ONE

DONT CREATE A SCRAPER

... if copy/paste is faster

DONT CREATE A SCRAPER

... if there is an API

OK, CREATE A SCRAPER

with Kimono, a web-based scraping tool

... but only if

  • The source you are scraping is somewhat clearly structured, and cleanly coded
     
  • The content needs to be static, instead of dynamically generated with JavaScript, no AJAX calls
     
  • You don't mind your work being public to all
     
  • You don't have complicated auth requirements
     
  • You don't mind the reliance on a third-party service 

#builtwithkimono

API SETUP 

START PAGE

RECOGNISE SIMILAR DATA

STRUCTURED DATA

MANUALLY CORRECT IF NEEDED

PREVIEW API END POINT

SET A SCHEDULE

SET A CRAWL STRATEGY

USE YOUR API

ACCESS AS JSON / CSV / RSS

SYNC WITH A GOOGLE SHEET

ADD A WIDGET TO YOUR SITE

DISTRIBUTE AS AN APP

TRANSFORM THE DATA

GET SEED URLS FOR 2ND SCRAPER

DEMO

FREE TIER

  1. Crawl up to 10,000 with a single API
  2. Access your data in standard formats JSON/CSV/RSS
  3. Email alerts and webhooks
  4. Access to the past 30 days of historic data
  5. Integrations with google sheets and wordpress

BUSINESS TIER

  1. Probably not for you, but offers
  2. Private APIs
  3. Auth support (currently in Beta and also available for free)
  4. Change Detection
  5. Outsourced API creation and maintainance

PRICING

m@type.hk

mart van de ven

TALK BY

@tijptjik

Kimono - Scraping the Web

By Mart van de Ven

Kimono - Scraping the Web

Guide to Kimono, a Visual Web Scraper

  • 1,445