Web Scraping with Python

Bhavika Tekwani



  • Some knowledge of Python

Why scrape a webpage?

  • Save data trapped in webpages
  • To obtain data in the absence of an API
  • Stay anonymous

Workshop Outline

  • BeautifulSoup & Requests
  • Scrape weather.gov
  • Save data to a CSV file


  1. Go to weather.gov
  2. Find the "Local forecast  by City, State, ZIP"
  3. Enter Washington DC



from bs4 import BeautifulSoup
import requests

# specify the URL you're visitng

link = "http://forecast.weather.gov/MapClick.php?lat=38.8951&lon=-77.0364#.WO-9S3UrLCI"

# request a web page!

page = requests.get(link)

# 200 - means success, 404 - page not found, 500 - server error
print (page.status_code)

# show the HTML structure of the webpage
print (page.content)

soup = BeautifulSoup(page.content, 'html.parser')

# find the ID for the seven day forecast section of the page
# use the 'find' method to get that section

seven_day = soup.find(id='seven-day-forecast')

# class in HTML refers to a style defined in the CSS stylesheet for the page
# find - gets one element or the first occurrence of a search term
# find all - gets all elements matching the search term

forecast_items = seven_day.find_all(class_="tombstone-container")

print (forecast_items)

tonight = forecast_items[0]

# Find the image in the section with the 'img' tag

# The title is connected to the image

img = tonight.find("img")
desc = img['title']


# Let's get the forecast for all 7 days

period_tags = seven_day.select(".tombstone-container .period-name")

periods = [pt.get_text() for pt in period_tags]


Scraping multiple items

# Repeat the same step for text that we want to extract 
# short_descs is short description
# temps is temperature
# desc is description

short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")]
descs = [d["title"] for d in seven_day.select(".tombstone-container img")]


Making Lists out of Extracted Text

Writing data to a file

# zip - links corresponding elements of multiple lists together

# For example, 
#    a = [1, 2, 3]
#    b = [4, 5, 6]
#    c = zip(a, b)
#    c looks like : (1, 4) (2, 5) (3, 6)

import csv

data = list(zip(periods, short_descs, temps, descs))

# Open a file in write mode - 'w'
with open('weather.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f, delimiter=',')
    for i in data:
        l = list(i)