with
Kartik
kartik@brown.edu
<!DOCTYPE html>
<html>
<head>
<title>This is a title</title>
</head>
<body>
<p>Hello world!</p>
<p>
<a href="https://www.wikipedia.org/">A link to Wikipedia!</a>
</p>
</body>
</html>
Why Scrape the Web?
© Wikipedia
© Douglas Crockford
© Douglas Crockford
© Douglas Crockford
Tags <…>
#id
.class
text
https://github.com/k4rtik/fetch-topicals
JSON
{
"1976": {
"0": {
"alt": "Use your head!",
"caption": "When helmets were made compulsory in Bombay",
"url": "http://www.amul.com/files/hits/amul-hits-1251.gif"
},
"1": {
"alt": "The Big Payoff",
"caption": "The Big Payoff",
"url": "http://www.amul.com/files/hits/amul-hits-1250.gif"
}
},
"1986": {
"0": {
"alt": "Wahi hota hai jo Manzure Ilahi hota hai.",
"caption": "The Pakistani cricketer Mansur Ilahi in great form.",
"url": "http://www.amul.com/files/hits/amul-hits-1145.jpg"
},
"1": {
"alt": "In all Holmes Amul's elementary, my dear.",
"caption": "During the time when a film on Shrelock Holms was being screened.",
"url": "http://www.amul.com/files/hits/amul-hits-1144.jpg"
}
}
}
# in Python
import json
Others
But there is sometimes hope: APIs!