Learning Data Science

Lecture 6
Data Exploration with Pandas

Visualization

Looking at the numbers alone is not enough!

Style is important

Math is important

Open AI's Official
ChatGPT 5 Release Video

(Aug 2025)

Math is

important

matplotlib
we did
cool stuff
with
matplotlib
we did cool stuff with seaborn, too

Text file cheat sheet

File Format When to use
CSV Tabular data, not to much data
JSON Web/API
TOML Human-readable config
YAML Human-readable config
XML Working with super old machine
TXT Notes, logs, etc
# uv add pyyaml
import yaml

with open("lec5-data/userinfo.yaml", "r") as f:
    data = yaml.load(f, Loader=yaml.FullLoader)
    
print(data)

Lecture 6

  1. Recap
  2. Intro to pandas

Pandas

Unfortunately not a zoology class

Panda facts: 

  1. Newborn panda cubs are incredibly small, weighing only around 100 grams (about 4 ounces), roughly 1/900th the size of their mother. 

pandas

  • fast
  • powerful
  • flexible
  • easy to use
  • open source
  • data analysis tool
  • data manipulation tool

Definition from the website

Install

uv add pandas

Everything we will learn today with Pandas

  • Series
  • dtypes
  • indexing
  • Series operations
  • Boolean Indexing
  • DataFrames
  • Inspecting DataFrames
  • Reading/writing tabular data
  • Indexing rows
  • DataFrame boolean indexing
  • Sorting DataFrames
  • Stats of columns
  • DataFrame Operations
  • The magic apply() method
  • Axes in DataFrames
  • Aggregation
  • Plotting from pandas
  • Pandas and Seaborn

Anatomy of a Pandas DataFrame

CSV file

pd.Series

Geom

Alg

Trig

Calc

4.0
4.3
2.0
2.3

Take one column

Take one row

1.3
4.0
1.3

Susie

Jay

Lara

pd.Series

A fancy numpy array, but with an index column

Take one column

Susie

Jay

Lara

pd.DataFrame

✨ Fancy Google Sheets or Excel but in Python

Trig Alg Geom Calc
1.3 1.3 3.7 2.3
4.0 4.3 2.0 2.3
1.3 1.0 2.0 3.0

Open with pandas

Index

Name Trig Alg Geom Calc
Lara 1.3 1.3 3.7 2.3
Jay 4.0 4.3 2.0 2.3
Susie 1.3 1.0 2.0 3.0

Live Coding

The pandas demonstration today is live coded.

 

If you are checking back after the lecture, please view the rest the lecture notes in this notebook:

[URL soon]

Challenge #1

orders = pd.Series(
    [
        "Americano",
        "Cappuccino",
        "Cold Brew",
        "Espresso",
        "Flat White",
        "Cappuccino",
        "Cold Brew",
        "Mocha",
        "Cold Brew",
        "Matcha Latte",
        "Cappuccino",
        "Espresso",
        "Chai",
        "Matcha Latte",
        "Cortado",
        "Cold Brew",
        "Americano",
        "Cappuccino",
        "Mocha",
        "Matcha Latte",
        "Cappuccino",
        "Cold Brew",
        "Mocha",
        "Cold Brew",
        "Matcha Latte",
        "Cold Brew",
        "Flat White",
        "Cappuccino",
        "Espresso",
        "Cold Brew",
        "Cappuccino",
        "Espresso",
        "Chai",
        "Matcha Latte",
        "Cortado",
        "Cappuccino",
        "Cold Brew",
        "Mocha",
        "Cold Brew",
        "Matcha Latte",
        "Americano",
        "Cappuccino",
        "Cold Brew",
        "Espresso",
        "Flat White",
    ],
    dtype="string",
)

How many orders were for "Cappuccino", "Cold Brew", and "Matcha Latte"?

First CSV file

Lecture 6

  1. Recap
  2. Intro to pandas

The End

Learning data Science Lecture 6

By astrojarred

Private

Learning data Science Lecture 6