HTTP Request
HTML File
Beautiful Soup will
parse the HTML for us !
Websites are a mess
Anything inside double or single quotes is a string
Any text value that you use in a program needs to come into quotes, otherwise it will look for a variable name and will give you an error if it doesn't find it.
Numbers can be either integers (whole numbers) or floats (numbers with a fractional part)
A number between quotes is a string not an integer or float
Variables are just a way programmers name values, like in math class!
Variables can be re-assigned
Boolean values: True, False
If and Else condition
Logic Operations (or, and)
While and For loop
Python has a built-in List type for storing a collection of values.
Generally we don't know what we're collecting, so we start with an empty list to which we add values
To do something with each element of the list, we iterate over it.
(It does not matter how we name the first variable in the list, however it's important we get the second variable right as it is the one that contains a list)
Another common built-in data structure is called a Dictionary
Dictionaries are used to store “key-value” information. Here, “War and Peace” is the key, and the description is the value. So it works like a real… dictionary! 📖
Pre-built functions: type(), str()...
Import functions from downloaded packages
pandas, numpy, matplotlib...
Defining your own with def()
https://books.toscrape.com
import requests
from bs4 import BeautifulSoup
response = requests.get("https://books.toscrape.com/")
soup = BeautifulSoup(response.content, "html.parser")
Importing our Libraries
Generic way to use BeautifulSoup
soup.select("h3 p")
Finding an element h3
CSS Selectors
Finding all h3 elements
soup.find_all("h3")
soup.find("h3")
soup.find(class_="product_price")
Finding a product_price class
soup.select(".product_pod")