WEB CRAWLER
WITH RUBY
Rifki Fauzi, ST
@kubid
AN INTERNET BOT THAT SYSTEMATICALLY BROWSES THE WORLD WIDE WEB, TYPICALLY FOR THE PURPOSE OF WEB INDEXING.
RUBYGEMS
OPEN-URI
NOKOGIRI
OPEN-URI
OpenURI is an easy-to-use wrapper for net/http, net/https and net/ftp.
NOKOGIRI
Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser
IT CAN PARSE
HTML, XML & SAX
- parsing from string
- parsing from file
- parsing from internet / web url
PARSING FROM
INTERNET
require 'open-uri'
doc = Nokogiri::HTML(open("http://www.somewebsite.com/"))
Searching an HTML / XML Document
Searching an HTML/XML document
is like searching DOM using jQuery
characters = @doc.css("body .sitcoms div.name")
# will return :
# => ["<title>Example Feed</title>", "<title>Atom-Powered Robots Run Amok</title>"]
THATS IT
VERY EASY RIGHT ?
THANKS