Intro to BS4

- Akul Mehra -

What is Web Scraping ?

Consider the following HTML code


<html>
    <head>
    </head>
    <body>
        <p>
            Here's a paragraph of text!
        </p>
        <p>
            Here's a second paragraph of text!
        </p>
    </body>
</html>

This is how a web page is rendered by browser

HTML DOM TREE

What is BeautifulSoup ?

This is Not!

What is BeautifulSoup ?

Beautiful Soup is a Python library for pulling data out of HTML and XML files.

To Install via pip:

$ pip install beautifulsoup4

 

Creating a Scraper using BeautifulSoup

  • Download the webpage using Requests library.
  • Create a BeautifulSoup object of page.
  • Scrape the required the results.

Questions ?

Thank You

Intro to BS4

By Akul Mehra