Project Scaffolding
uv init -p 3.13uv add numpyPython Notebooks
Ways to interact with Python code
REPL
✅ Interactive ❌ Sharable ❌ Version Control friendly ❌ Reproducible ❌ Mix code, text, plots
Scripts
❌ Interactive
✅ Sharable
✅ Version Control friendly
✅ Reproducible ❌ Mix code, text, plots
Notebooks
✅ Interactive
✅ Sharable
⚠️ Version Control friendly
⚠️ Reproducible
✅ Mix code, text, plots
Linear Algebra in a Nutshell
The math of vectors and matrices
How to combine and transform them
Everything is a tensor
Graphics from: St. Lawrence U CS140 and Montesinos-López et al (2022)
Image width (4px)
Image height (4px)
Image "depth" (3 color channels)
NumPy
an array library that powers all of scientific Python
Fancy lists
Vectors
Matrices
Tensors
x = np.array([1,2,3,4,5])
# OR
l = [1, 2, 3, 4, 5]
x = np.array(l)x = np.ones(5)
x = np.zeros(5)x = np.array([1,2,3,4,5])
y = np.ones_like(x)x = np.arange(6)
x = np.arange(1, 6, 2)
x = np.linspace(0, 1, 11)rng = np.random.default_rng()
# 5 ints between [0,100)
rng.integers(0, 100, 5)
# 10 ints between [0, 1)
rng.random(10)
# 6 samples from a gaussian
# mean=5, std=3
rng.normal(5, 3, 6)x = np.arange(1, 6)
# [1, 2, 3, 4, 5]
x[1] # = 2
x[2:4] # = [3, 4]
# everything from index -2 and onwards
x[-2:] # = [4, 5]
# Every 2 indices (step=2)
x[::2] # = [1, 3, 5]
# specifically indicies 1, 3, 4
x[[1, 3, 4]] # = [2, 4, 5]x = np.array([1, 2])
y = np.array([3, 6])
x + y # = [4, 8]
x * y # = [3, 12]
x + 2 # = [3, 4]
np.sqrt(x)
np.sin(x)x.max()
x.sum()
x.mean()
# etcnp.array([1, 2, 3, 4])Normal 1D Vector (4,)
4 rows, 0 columns
np.array([[1, 2, 3, 4]])Row 2D Vector (1, 4)
1 row, 4 columns
np.array([[1], [2], [3], [4]])Column 2D Vector (4, 1)
4 rows, 1 column
x.Tx.Tx.reshape(1, -1)x.flatten()x.reshape(-1, 1)x.flatten()Why Visualize Anything?
Debatable!
Clearly not true for everyone!
However, we use our sight often to better understand scientific results:
Listening to data
"When we synchronize our different ways of perceiving the world, our sensitivity to events that are masked to the eye ... increases exponentially."
Listening to data
While today we're going to talk about visual ways to communicate data, always remember there are other ways too!
Why Visualization?
[[-0.99582463 0.99785717]
[-0.98329854 0.9975254 ]
[-0.97077244 0.99820004]
[-0.95824635 0.99651706]
[-0.94572025 0.99664487]
[-0.93319415 0.99538239]
[-0.92066806 0.995524 ]
[-0.90814196 0.99469056]
[-0.89561587 0.99384034]
[-0.88308977 0.99323822]
[-0.87056367 0.99233328]
[-0.85803758 0.99210751]
[-0.84551148 0.99107832]
[-0.83298539 0.99010279]
[-0.82045929 0.98942019]
[-0.80793319 0.98934844]
[-0.7954071 0.98927283]
[-0.782881 0.98808468]
[-0.77035491 0.98737426]
[-0.75782881 0.98696232]
[-0.74530271 0.98682547]
[-0.73277662 0.98587808]
[-0.72025052 0.98514856]
[-0.70772443 0.98531636]
[-0.69519833 0.98464728]
[-0.68267223 0.9847355 ]
[-0.67014614 0.98388559]
[-0.65762004 0.98306121]
[-0.64509395 0.98279164]
[-0.63256785 0.98383771]
[-0.62004175 0.98375677]
[-0.60751566 0.98293767]
[-0.59498956 0.98383009]
[-0.58246347 0.98452698]
[-0.56993737 0.98345948]
[-0.55741127 0.98350519]
[-0.54488518 0.98408205]
[-0.53235908 0.9836294 ]
[-0.51983299 0.98374858]
[-0.50730689 0.98364951]
[-0.49478079 0.98381505]
[-0.4822547 0.98420902]
[-0.4697286 0.98364914]
[-0.45720251 0.98383823]
[-0.44467641 0.9827292 ]
[-0.43215031 0.98349927]
[-0.41962422 0.98315183]
[-0.40709812 0.98297283]
[-0.39457203 0.9830812 ]
[-0.38204593 0.98327179]
[-0.36951983 0.98374565]
[-0.35699374 0.98282346]
[-0.34446764 0.98334186]
[-0.33194154 0.9831124 ]
[-0.31941545 0.98315234]
[-0.30688935 0.98366272]
[-0.29436326 0.98345594]
[-0.28183716 0.9837556 ]
[-0.26931106 0.98313967]
[-0.25678497 0.98290479]
[-0.24425887 0.98307619]
[-0.23173278 0.98324617]
[-0.21920668 0.98320411]
[-0.20668058 0.9826849 ]
[-0.19415449 0.98314091]
[-0.18162839 0.98318293]
[-0.1691023 0.98408642]
[-0.1565762 0.98381884]
[-0.1440501 0.98271634]
[-0.13152401 0.9829331 ]
[-0.11899791 0.98245396]
[-0.10647182 0.98279521]
[-0.09394572 0.98315076]
[-0.08141962 0.98350076]
[-0.06889353 0.98272155]
[-0.05636743 0.98274717]
[-0.04384134 0.98214642]
[-0.03131524 0.98293766]
[-0.01878914 0.98257601]
[-0.00626305 0.98278833]
[ 0.00626305 0.98264936]
[ 0.01878914 0.98296328]
[ 0.03131524 0.98229963]
[ 0.04384134 0.98241852]
[ 0.05636743 0.98386052]
[ 0.06889353 0.98249822]
[ 0.08141962 0.9825797 ]
[ 0.09394572 0.98375928]
[ 0.10647182 0.98419352]
[ 0.11899791 0.98257071]
[ 0.13152401 0.98290075]
[ 0.1440501 0.98319426]
[ 0.1565762 0.98375958]
[ 0.1691023 0.98268469]
[ 0.18162839 0.98299352]
[ 0.19415449 0.98341565]
[ 0.20668058 0.98329256]
[ 0.21920668 0.98298301]
[ 0.23173278 0.98309564]
[ 0.24425887 0.98261577]
[ 0.25678497 0.98308789]
[ 0.26931106 0.98309491]
[ 0.28183716 0.98331351]
[ 0.29436326 0.98301856]
[ 0.30688935 0.98345023]
[ 0.31941545 0.98368849]
[ 0.33194154 0.98336824]
[ 0.34446764 0.98347031]
[ 0.35699374 0.98337527]
[ 0.36951983 0.98337933]
[ 0.38204593 0.98311682]
[ 0.39457203 0.98355906]
[ 0.40709812 0.98342144]
[ 0.41962422 0.98432639]
[ 0.43215031 0.9841481 ]
[ 0.44467641 0.98370361]
[ 0.45720251 0.98327543]
[ 0.4697286 0.98316794]
[ 0.4822547 0.98412248]
[ 0.49478079 0.98378512]
[ 0.50730689 0.98390695]
[ 0.51983299 0.98305279]
[ 0.53235908 0.98415821]
[ 0.54488518 0.98400649]
[ 0.55741127 0.9834189 ]
[ 0.56993737 0.98371369]
[ 0.58246347 0.98404789]
[ 0.59498956 0.98400435]
[ 0.60751566 0.98289261]
[ 0.62004175 0.98302591]
[ 0.63256785 0.98307655]
[ 0.64509395 0.98339993]
[ 0.65762004 0.98413959]
[ 0.67014614 0.98278563]
[ 0.68267223 0.98402645]
[ 0.69519833 0.98454638]
[ 0.70772443 0.98499224]
[ 0.72025052 0.98516417]
[ 0.73277662 0.98508764]
[ 0.74530271 0.98643427]
[ 0.75782881 0.98753893]
[ 0.77035491 0.98680817]
[ 0.782881 0.98836621]
[ 0.7954071 0.98850978]
[ 0.80793319 0.98925487]
[ 0.82045929 0.9895102 ]
[ 0.83298539 0.99045925]
[ 0.84551148 0.99178033]
[ 0.85803758 0.99216214]
[ 0.87056367 0.99328812]
[ 0.88308977 0.99372549]
[ 0.89561587 0.99407863]
[ 0.90814196 0.99523416]
[ 0.92066806 0.99532697]
[ 0.93319415 0.99607476]
[ 0.94572025 0.99619085]
[ 0.95824635 0.99687223]
[ 0.97077244 0.99706969]
[ 0.98329854 0.99821132]
[ 0.99582463 0.9977719 ]]Why Visualization?
Alberto Cairo’s "Datasaurus Dozen"
Looking at the numbers alone is not enough!
Challenge #1
Find what is off, weird, confusing, misleading, or 'too much' in each of the following visualizations:
Previous president of Colombia claiming that collective homicides went down during his time in office (2018-2020)
Open AI's Official
ChatGPT 5 Release Video
(Aug 2025)
matplotlib
Create visualizations:
Generally flexible enough to do whatever you need
Installing matplotlib
uv add matplotlibimport matplotlib.pyplot as pltImporting matplotlib
Everyone nicknames this package to plt!
Plotting Functions with line plots
Making Plots
x = np.array([-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5])
y = [i**2 for i in x]
plt.plot(x, y)
plt.show()Making Plots
x = np.linspace(-10, 10, 1000)
y = [i**2 for i in x]
plt.plot(x, y)
plt.show()Add title and axis labels
x = np.linspace(-10, 10, 1000)
y = [i**2 for i in x]
plt.plot(x, y)
plt.title("A plot of $f(x) = x^2$")
plt.xlabel("x")
plt.ylabel("$x^2$")
plt.show()Scatter Plot
Scatter Plot
rng = np.random.default_rng()
x = rng.normal(0, 1, 20)
y = rng.normal(0, 1, 20)
plt.scatter(x, y)
plt.show()Good for unordered, 2D data
Histograms
Histograms
x = rng.normal(171, 20, 2000)
plt.hist(x)
plt.xlabel("Height (cm)")
plt.ylabel("Frequency")
plt.title("Height Distribution of 2000 People")
plt.show()Show the frequency distribution of 1D data
Axes and Labels
Axes and Labels
rng = np.random.default_rng()
x = rng.normal(0, 1, 20)
y = rng.normal(0, 1, 20)
plt.scatter(x, y)
plt.show()Same scatter as before
Axes and Labels
rng = np.random.default_rng()
x = rng.normal(0, 1, 200)
y = rng.normal(0, 1, 200)
plt.scatter(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("My Great Scatter Plot")
plt.show()
Can set x and y axes, as well as title
Overplotting and Legends
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1)
plt.plot(x, y2)
plt.show()Anything you do before 'show()' will all show up on the same axes!
Overplotting and Legends
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1, label="sin(x)")
plt.plot(x, y2, label="cos(x)")
plt.legend()
plt.show()Use the 'label' kwarg, and add plt.legend() to automatically make a Legend!
Overplotting and Legends
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1, label="sin(x)")
plt.plot(x, y2, label="cos(x)")
plt.legend(loc="lower right")
plt.show()Use the loc kwarg to select the position of the legend!
Overplotting and Legends
Style
Style
Line Style
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1, label="sin(x)", linestyle="dashed")
plt.plot(x, y2, label="cos(x)", linestyle="dotted")
plt.legend()
plt.show()Use the loc kwarg to select the position of the legend!
Line Style
Line Style
plt.plot(x, y1, label="sin(x)", linewidth=5, linestyle="dashed")
plt.plot(x, y2, label="cos(x)", linewidth=10, linestyle="dotted")
plt.legend()
plt.show()Change the width of your lines
Marker Style
plt.scatter(x, y, marker="v", label="People who do", )
plt.scatter(x2, y2, marker="x", label="People who do not", )
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("My Great Scatter Plot")
plt.legend()
plt.show()
Change the shape of points:
full list here
Marker Style
plt.scatter(x, y, marker="v", label="People who do", )
plt.scatter(x2, y2, marker="x", label="People who do not", )
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("My Great Scatter Plot")
plt.legend()
plt.show()
Change the size of the markers
Color
New overplots automatically cycle through a list of colors
Color
You can also set the colors you want specifically.
plt.scatter(x, y, color="red", label="People who do", marker="v")
plt.scatter(x2, y2, color="blue", label="People who do not", marker="x")
plt.scatter(x3, y3, color="green", label="People who might", marker="D")
plt.scatter(x4, y4, color="#6f42c1", label="People who would never", marker="+")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("My Great Scatter Plot")
plt.legend()
plt.show()Marker Size
Use the 's' variable,
represented in area^2 of the plot
plt.scatter(x, y, s=5, label="People who do", marker="v")
plt.scatter(x2, y2, s=100, label="People who do not", marker="x")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("My Great Scatter Plot")
plt.legend()
plt.show()Alpha (Transparency)
Change the opacity of curves/markers. Alpha is in the range [0,1]
plt.plot(x, y1, alpha=0.9, color="#5a2094")
plt.plot(x, y2, alpha=0.8, color="#5a2094")
plt.plot(x, y3, alpha=0.5, color="#5a2094")
plt.plot(x, y4, alpha=0.3, color="#5a2094")
plt.plot(x, y5, alpha=0.1, color="#5a2094")Log Scale
Log Scale
x = np.linspace(0, 100, 1000)
y = np.exp(x)
plt.plot(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Normal Axes")
plt.show()
plt.plot(x, y)
plt.yscale("log")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Logarithmic Axes")
plt.show()Log Scale
x = np.linspace(0, 100, 1000)
y = np.exp(x)
plt.plot(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Normal Axes")
plt.show()
plt.plot(x, y)
plt.yscale("log")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Logarithmic Axes")
plt.show()Use plt.xscale or plt.yscale
Saving Plots
Saving Plots
Can always drag out of a notebook
Saving Plots
Can also save the file from code
x = np.linspace(-10, 10, 1000)
y = [i**2 for i in x]
plt.plot(x, y)
plt.title("A plot of $f(x) = x^2$")
plt.xlabel("x")
plt.ylabel("$x^2$")
plt.savefig("my_plot.png")
plt.show()
Challenge #1
Challenge #1
Try to recreate this figure as exact as possible!
Figure Size
Figure Size
plt.figure(figsize=(10, 5))
plt.plot(x, y)
plt.show()
plt.figure(figsize=(3, 10))
plt.plot(x, y)
plt.show()plt.figure(figsize=(X,Y))
💫 You are now a master plot maker!
If you're someone who likes design, you might have noticed they are not so pretty by default.
Let's look at a tool which can help 💅
What is seaborn?
Seaborn is a library built on top of matplotlib
Attempts to make your plots
✨effortlessly pretty✨
just like you
ALSO: Provides a user-friendly high-level interface for making statistical plots
What is seaborn?
Installing seaborn
uv add seabornimport seaborn as snsStarting seaborn
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()Using seaborn styles in matplotlib
Now even matplotlib plots will look a bit nicer
Challenge #2
Challenge #2
Redo the plot from Challenge #1 but with seaborn active
rng = np.random.default_rng()
height = rng.normal(171, 20, 2000)
sns.displot(height, kde=True, label="Height distribution")
plt.xlabel("Height (cm)")
plt.ylabel("Frequency")
plt.title("Height Distribution of 2000 People")
plt.legend()
plt.show()
sns.displot(height, kind="ecdf", label="Cumulative height distribution")
plt.xlabel("Height (cm)")
plt.ylabel("Cumulative frequency")
plt.title("Height Distribution of 2000 People")
plt.legend()
plt.show()
Seaborn historgrams
Heatmaps
Heatmaps
And many more!
Important Data Formats
Plaintext: .txt .json .csv .xml .yaml .toml
Fancier: .hdf5 .parquet
.lmdb
.SQL
.excel
Plaintext: .txt .json .csv .xml .yaml .toml
Today
Text-based data formats
Plaintext: .txt .json .csv .xml .yaml .toml
Plaintext: .txt .json .csv .xml .yaml .toml
Today
Why so many?
Text-based data formats
Plaintext: .txt .json .csv .xml .yaml .toml
Plaintext: .txt .json .csv .xml .yaml .toml
key-value? Like a dict!
Mental model:
.txt → free-form notes/logs
.csv → rows & columns (tabular)
.json/.xml/.yaml/.toml → nested key–value structures (configs, APIs)
Text files - .txt
What: Unstructured plain text; the simplest possible file.
Origins: Since the beginning of time
Used for:
Notes
logs
docs
Why use it: Open with any software ever
Caveat: No built-in structure
you must define your own conventions.
Reading text files with Python
with open("lec5.txt", "r") as file:
data = file.read()
print(data)
Challenge #3
Challenge #3
44 12 96 12 129 120 49 60 38 11 20 478 938 40 102 222 102 23 40 58 40 12 12 12 12 49 60 48 27 37 40 17 172 11 98CSV
What: Comma-Separated Values one row per line
Commas to separate columns
Origins: 1970s
Used for: Spreadsheets, databases, etc
Why use it: Universal, great for tabular data.
Caveats:
Schema not embedded
Slooooow
Reading CSV with Python
import csv
with open("lec5-data/bus.csv", newline="", encoding="utf-8") as f:
rows = list(csv.DictReader(f))
print(rows)
print(rows[0])
# can also convert to numpy array
data = np.array(rows)
print(data)date,station,rides
2025-09-05,Central,120
2025-09-05,West,95
2025-09-06,Central,130
2025-09-06,West,105
[{'date': '2025-09-05', 'station': 'Central', 'rides': '120'}
{'date': '2025-09-05', 'station': 'West', 'rides': '95'}
{'date': '2025-09-06', 'station': 'Central', 'rides': '130'}
{'date': '2025-09-06', 'station': 'West', 'rides': '105'}]JSON
What: JavaScript Object Notation
Origins: 2000s
Used for: Web APIs
Most data passed around from websites to you browser is communicated via JSON
Why use it: Human-readable, things are typed
Caveats: No comments allowed! :(
JSON
{
"users": [
{
"name": "Lady Gaga",
"email": "lady.gaga@mpp.mpg.de",
"age": 36,
"signed_in": true
},
{
"name": "David Hasselhoff",
"email": "david.hasselhoff@tum.de",
"age": 25,
"signed_in": false
},
{
"name": "Johann Sebastian Bach",
"email": "bach@db.de",
"age": 300,
"signed_in": true
}
]
}
# json
import json
with open("userinfo.json", "r") as f:
data = json.load(f)
print(type(data))
print(data["users"][0])
The type is a dictionary!
XML
What: eXtensible Markup Language
Origins: W3C 1998
Used for: document formats (e.g., Office .docx inside is XML!), config files, RSS, HTML?
Why use it: uhhhhh....
Caveats: Verbose, usually overkill
<?xml version="1.0" encoding="UTF-8"?>
<users>
<user>
<name>Lady Gaga</name>
<email>lady.gaga@mpp.mpg.de</email>
<age>36</age>
<signed_in>true</signed_in>
</user>
<user>
<name>David Hasselhoff</name>
<email>david.hasselhoff@tum.de</email>
<age>25</age>
<signed_in>false</signed_in>
</user>
<user>
<name>Johann Sebastian Bach</name>
<email>bach@db.de</email>
<age>300</age>
<signed_in>true</signed_in>
</user>
</users>
XML
this is the same address book as before
YAML
What: YAML Ain’t Markup Language —
Origins: ~2001
Used for: Python Configs, GitHub Actions, CI/CD
Why choose it: Very readable for humans!
Caveats: indentation sensitivity
YAML
users:
- name: Lady Gaga
email: lady.gaga@mpp.mpg.de
age: 36
signed_in: true
- name: David Hasselhoff
email: david.hasselhoff@tum.de
age: 25
signed_in: false
- name: Johann Sebastian Bach
email: bach@db.de
age: 300
signed_in: true
# uv add pyyaml
import yaml
with open("lec5-data/userinfo.yaml", "r") as f:
data = yaml.load(f, Loader=yaml.FullLoader)
print(data)Also loads into a dictionary!
TOML
What: TOML (Tom’s Obvious, Minimal Language)
Origins: 2013, by Tom Preston-Werner (a GitHub co-founder!)
Used for: Python packaging (pyproject.toml)
Why choose it: simple grammar, types
Caveats: Nested structures are simpler
Reading in TOML files
import tomllib
with open("lec5-data/userinfo.toml", "rb") as f:
data = tomllib.load(f)Again, loads into a dictionary!
tomllib included in python since v3.11!
Cheat Sheet
| File Format | When to use |
|---|---|
| CSV | Tabular data, not to much data |
| JSON | Web/API |
| TOML | Human-readable config |
| YAML | Human-readable config |
| XML | Working with super old machine |
| TXT | Notes, logs, etc |
The End