Functions
f
Inputs
Outputs
def c_to_f(celsius):
return (celsius * 9/5) + 32
for temp in [0, 20, 100]:
print(temp, "°C =", c_to_f(temp), "°F")Typehints
def scream(word: str, times: int) -> str:
saying = word * times
return saying.upper() + "!"
def scale_numbers(numbers: list[int], factor: int | float) -> list[float]:
result = []
for n in numbers:
result.append(n * factor)
return resultDocstrings
def scream(word: str, times: int) -> str:
"""
Repeat a word several times and return it in uppercase with an exclamation mark.
Parameters
----------
word : str
The word to repeat.
times : int
How many times to repeat the word.
Returns
-------
str
The repeated word in uppercase, followed by an exclamation mark.
Examples
--------
>>> scream("ha", 3)
'HAHAHA!'
"""
saying = word * times
return saying.upper() + "!"There are many ways to format docstrings
This one is called the numpy style
Exceptions
def circle_area(radius: float) -> float:
if radius < 0:
raise ValueError("Radius cannot be negative.")
return 3.1415 * radius ** 2import time
time.sleep(5)
# or you can import just one function
from time import sleep
sleep(5)
# nickname a module/function
import time as t
t.sleep(5)Python Builtin Functions and Modules
class Character:
"""A character in our video game."""
def __init__(self, name: str, base_health: int):
# ...
self.name = name
self.base_health = base_health
self.items: list[str] = []
def pickup_item(self, item: str):
"""
Add an item to the character's inventory.
Parameters
----------
item : str
The item to pick up
"""
self.items.append(item)Character
name
health
items
pickup_item()
warrior = Character(name="Hodor", health=100)
print(warrior.items)
warrior.pickup_item("sword")
print(warrior.items)Objects
"nouns"
"verbs"
Debugging
Debugging in VS Code
Project Scaffolding
Let's work through it together!
Project Scaffolding
1. Create a new folder 'ds-lecture-4' in your projects directory. This will be the root of today's work.
2. Open this folder with VS Code
3. Run the command:
uv init -p 3.13By the end you should see these files
4. Run the command:
- this will create the virtual environment
5. If the following popup appears, always say yes!
- if it doesn't appear, use the command palette
Project Scaffolding
uv syncEverything here is review from Lecture 2 :)
6. Use the command palette to select "Publish to GitHub"
- you can choose if you want your code to be public or private
Adding a new project to your GitHub
Your repository is available now at github.com/username/ds-lecture-4 ✨
7. Let's install the 'tqdm' package to our environment
Adding external packages
uv add tqdmtqdm got added to pyproject.toml !
8. Try using tqdm in a small script to test that it worked
Adding external packages
from tqdm import tqdm
for i in tqdm(range(30000000)):
pass
9. Sync our changes to GitHub
- Now we can use the VS Code UI to save time ✨
Adding external packages
And that's how you start a project and install packages with uv!
Python Notebooks
Ways to interact with Python code
REPL
✅ Interactive ❌ Sharable ❌ Version Control friendly ❌ Reproducible ❌ Mix code, text, plots
Scripts
❌ Interactive
✅ Sharable
✅ Version Control friendly
✅ Reproducible ❌ Mix code, text, plots
Notebooks
✅ Interactive
✅ Sharable
⚠️ Version Control friendly
⚠️ Reproducible
✅ Mix code, text, plots
Python Notebooks
Python Notebooks
⚠️ Reproducing results
⚠️ Tracking changes with Git
⚠️ Scalability
⚠️ Performance
✅ Playing with data
✅ Taking nodes
✅ Making plots
✅ Interactive
✅ Good for demos
Notebooks in VS Code
Do it!
3.
6.
5.
ipynb = Interactive Python Notebook
Notebooks in VS Code
When you first try to run a cell with code, it will ask you to select your interpreter again.
Remember to use the one located at
.venv
Notebooks in VS Code
Add a code cell
Add a markdown cell
Run all notebook cells sequentially
Run this cell only
Delete this cell
Switch this cell between python/markdown
Restart the notebook (all imports and variables lost)
Linear Algebra
The math of vectors and matrices
How to combine and transform them
What is a vector?
A vector is a list of numbers.
That's it ✨
Many concepts and physical quantities can be expressed in terms of vectors
What is a vector?
A vector is a list of numbers.
That's it ✨
Coordinates
Data
Equations
[4, 2, 1]
[x, y]
[4, 7, 8, 2]
What is a vector?
Vectors don't live in isolation:
there is usually some underlying information
Coordinates
Data
Equations
[x, y]
[4, 7, 8, 2]
[4, 2, 1]
Spatial coordinates and origin is assumed [0, 0]
Units are luminosity and data is captured every 1s
Numbers represent coefficients of a polynomial
Vector Math at a Glance
addition / subtration
vector multiplication
dot product
magnitude / norm
··√
Challenge #1
Challenge #1
Pen and paper exercises
find:
What is a Matrix?
Multiple vectors stacked together?
This can represent higher-order information
Coordinates ➡️ many points at once
What is a Matrix?
pair of coordinates
Matrix as a transformation
What is a Matrix?
Matrix as data with attributes
What is a Matrix?
Each row in the matrix is a one data point
Parameters of multiple functions
What is a Matrix?
Parameters of multiple functions
What is a Matrix?
What is a Tensor?
Multiple matrices stacked together.
A higher-order matrix.
What is a Tensor?
Rank 0 Tensor
Rank 1 Tensors
Rank 2 Tensor
Rank 3 Tensors
Everything is a tensor
Color images are tensors!
Graphics from: St. Lawrence U CS140 and Montesinos-López et al (2022)
Image width (4px)
Image height (4px)
Image "depth" (3 color channels)
Metrix Math at a Glance
matrix x vector
matrix multiplication
C
Metrix Math at a Glance
transpose
*flips matrix along main diagonal*
it's inversable
Now let's put it all together:
jupyter + linear algebra
✨
NumPy
💻 The rest of today's lecture will be spent learning NumPy.
Open up a new notebook which you can use for note-taking and completing challenges
Please install NumPy with
uv add numpy
What is NumPy?
NumPy is array library that powers all of scientific Python
Fancy lists
NumPy Superpowers:
⚡️ Vectorization
⚡️ Broadcasting
⚡️ Fassssst
We'll come to these later
Lists need loops
my_list = [1,2,3,4,5]
print(my_list ** 2)
# raises an Error!import numpy as np
my_array = np.array([1,2,3,4,5])
print(my_array**2)my_list = [1,2,3,4,5]
squared = []
for i in my_list:
squared.append(i**2)
# OR
squared = [i**2 for i in my_list]
print(squared)Solution using lists
Solution using numpy
Vectorization
import numpy as np
my_array = np.array([1,2,3,4,5])
print(my_array**2)
# array([ 1, 4, 9, 16, 25])This is called vectorization ✨
Operations on vectorized objects are:
print(my_array + 5)
# array([ 6, 7, 8, 9, 10])other = np.array([6,7,8,9,10])
print(my_array + other)
# array([ 7, 9, 11, 13, 15])What is NumPy?
NumPy is array library that powers all of scientific Python
Fancy lists
Fancy linear algebra
You can put arrays in arrays to create/manipulate vectors, matrices, and tensors
You can put arrays in arrays to create/manipulate vectors, matrices, and tensors
Later we'll see how it makes sense to manipulate data as vectors and matrices
This will be super important for the ML section next week as well!
Challenge #1
Challenge #1
import numpy as np
my_array = np.array([1, 2, 3, 4, 5])
other = np.array([6, 7, 8, 9, 10])my_list = [1, 2, 3, 4, 5]
other_list = [6, 7, 8, 9, 10]
With normal lists
With numpy arrays
# expected result
[ 6 14 24 36 50 ]Next up: 1D arrays (aka Vectors)
Array Attributes
Easily get the type or shape of an array
my_array = np.array([1,2,3,4,5])
print(my_array.dtype) # int64
print(my_array.shape) # (5)
my_array = np.array([1,2,3,4,5], dtype=float)
print(my_array.dtype) # float64Array Creation Patterns
print(np.ones(6))
# [1 1 1 1 1 1]Array Creation Patterns
You can also create arrays that are the same size as an already-existing one
my_array = np.array([1,2,3,4,5])
print(np.ones_like(my_array))
# [1 1 1 1 1]Initializing With Sequences
Random Arrays
my_array = [6, 89, 384]
rng = np.random.default_rng()
rng.choice(my_array)
# 89
# can pass probabilities
rng.choice(my_array, p=[0.1, 0.1, 0.8])
# 384 (probably ;)Challenge #2
Challenge #2
Create a function that uses numpy to generate n random numbers
between -1 and +1
# expected result
print(my_rand(n=3))
# [-0.348, 0.894, -0.028]Reproducible Randomness
rng = np.random.default_rng()
print(f"Seed: {rng.bit_generator.seed_seq.entropy}")
print(rng.random(5))
Seed: 121739455997546233386808762441738277564
[0.7538113 0.33547651 0.08440574 0.97809579 0.06358138]
rng2 = np.random.default_rng()
print(f"Seed: {rng2.bit_generator.seed_seq.entropy}")
print(rng2.random(5))
Seed: 187878457149970364692539552433862812938
[0.24586211 0.91028715 0.23901012 0.56294228 0.25043037]No seed provided
Reproducible Randomness
rng = np.random.default_rng(42) # <--- seed is 42
print(f"Seed: {rng.bit_generator.seed_seq.entropy}")
print(rng.random(5))
Seed: 42
[0.77395605 0.43887844 0.85859792 0.69736803 0.09417735]
rng2 = np.random.default_rng(42)
print(f"Seed: {rng2.bit_generator.seed_seq.entropy}")
print(rng2.random(5))
Seed: 42
[0.77395605 0.43887844 0.85859792 0.69736803 0.09417735]Seed provided: 42
Vector Indexing
Not too different from lists
Main difference: editing a slice of an array, changes the original!
a = np.arange(1, 6)
# [1 2 3 4 5]
a[2:4] = 0
print(a)
# [1 2 0 0 5]Boolean Indexing
Logical operations on arrays
# boolean indexing
a = np.array([1, 2, 3, 4, 5, 4, 3, 2, 1])
print(a > 3)
# [False False False True True False False False False]
a[a > 3] = 0
print(a)
# [1 2 3 0 0 0 3 2 1]Vector Math
Element-wise operations between two arrays
Vector Math
Element-wise operations between one array and one number (scalar)
This is called broadcasting ✨
In these cases, the scalar gets 'promoted' to an array behind the scenes
Vector Math
Most functions from the built-in `math` module also are in numpy and work on arrays!
Vector Math
Linear algebra 😱
Vector Math
Stats! 😱
Challenge #3
Challenge #3
Vectors / 1D arrays review
x = np.array([1,2,3,4,5])
# OR
l = [1, 2, 3, 4, 5]
x = np.array(l)Create an array
x = np.ones(5)
x = np.zeros(5)Initialize an array
x = np.array([1,2,3,4,5])
y = np.ones_like(x)Initialize from another array's shape
x = np.arange(6)
x = np.arange(1, 6, 2)
x = np.linspace(0, 1, 11)Initialize a sequence
rng = np.random.default_rng()
# 5 ints between [0,100)
rng.integers(0, 100, 5)
# 10 ints between [0, 1)
rng.random(10)
# 6 samples from a gaussian
# mean=5, std=3
rng.normal(5, 3, 6)Random number generator
x = np.arange(1, 6)
# [1, 2, 3, 4, 5]
x[1] # = 2
x[2:4] # = [3, 4]
# everything from index -2 and onwards
x[-2:] # = [4, 5]
# Every 2 indices (step=2)
x[::2] # = [1, 3, 5]
# specifically indicies 1, 3, 4
x[[1, 3, 4]] # = [2, 4, 5]Indexing
x = np.array([1, 2])
y = np.array([3, 6])
x + y # = [4, 8]
x * y # = [3, 12]
x + 2 # = [3, 4]
np.sqrt(x)
np.sin(x)Vector operations
x.max()
x.sum()
x.mean()
# etcStats
Matrices, 2D Arrays
Matrices
In NumPy, matrices are arrays inside an array!
np.array([1, 2, 3])
x.shape # (3)1D Array / Vector
np.array([1, 2, 3])
x.shape # (3)np.array([1, 2, 3])
x.shape # (3)2D Array / Matrix
x = np.array([
[1, 2, 3],
[4, 5, 6]
])
x.shape # (2, 3)("rows", "columns")Matrix Initialization
Most logic from 1D cases can be expanded
Indexing syntax
[row, column]
: means 'all'
(3, 4)
First row, every element
Every row, second element
First row, second col
Let's pretend this matrix represents student grades
Susie
Jay
Lara
| Trig. | Alg. | Geom. | Calc. |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 4.3 | 2.0 | 2.3 |
| 1.3 | 1.0 | 2.0 | 3.0 |
Indexing syntax
[row, column]
: means 'all'
Jay's Geometry Grade
Susie
Jay
Lara
| Trig. | Alg. | Geom. | Calc. |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 4.3 | 2.0 | 2.3 |
| 1.3 | 1.0 | 2.0 | 3.0 |
(3, 4)
First row, every element
Every row, second element
First row, second col
Indexing syntax
[row, column]
: means 'all'
All Jay's Grades
Susie
Jay
Lara
| Trig. | Alg. | Geom. | Calc. |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 4.3 | 2.0 | 2.3 |
| 1.3 | 1.0 | 2.0 | 3.0 |
(3, 4)
First row, every element
Every row, second element
First row, second col
Indexing syntax
[row, column]
: means 'all'
All Geometry Grades
Susie
Jay
Lara
| Trig. | Alg. | Geom. | Calc. |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 4.3 | 2.0 | 2.3 |
| 1.3 | 1.0 | 2.0 | 3.0 |
(3, 4)
First row, every element
Every row, second element
First row, second col
The Axis Argument
a.sum() =
Susie
Jay
Lara
| Trig. | Alg. | Geom. | Calc. |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 4.3 | 2.0 | 2.3 |
| 1.3 | 1.0 | 2.0 | 3.0 |
= 28.5
Susie
Jay
Lara
| Trig. | Alg. | Geom. | Calc. |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 4.3 | 2.0 | 2.3 |
| 1.3 | 1.0 | 2.0 | 3.0 |
a =
grades = np.array([
[1.3, 1.3, 3.7, 2.3],
[4.0, 4.3, 2.0, 2.3],
[1.3, 1.0, 2.0, 3.0]
])
a.sum(axis=0) =
Susie
Jay
Lara
= [6.6, 6.6, 7.7, 7.6]| Trig. | Alg. | Geom. | Calc. |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 4.3 | 2.0 | 2.3 |
| 1.3 | 1.0 | 2.0 | 3.0 |
a.sum(axis=1) =
Susie
Jay
Lara
= [8.6, 12.6, 7.3]| Trig. | Alg. | Geom. | Calc. |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 4.3 | 2.0 | 2.3 |
| 1.3 | 1.0 | 2.0 | 3.0 |
The Axis Argument
Susie
Jay
Lara
| Trig. | Alg. | Geom. | Calc. |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 4.3 | 2.0 | 2.3 |
| 1.3 | 1.0 | 2.0 | 3.0 |
a =
grades = np.array([
[1.3, 1.3, 3.7, 2.3],
[4.0, 4.3, 2.0, 2.3],
[1.3, 1.0, 2.0, 3.0]
])
Tabular data (i.e. row/column data) is naturally represented as a 2D matrix!
Challenge #4
Challenge #4
Copy the data from the code block below
fruits = np.array([
"apples",
"bananas",
"cherries",
"dates",
"elderberries",
"figs",
"grapes",
"huckleberries",
"kiwis",
"lemons",
"mangos",
"nectarines",
])
# amount sold per weekday
amounts = np.array([
[10, 20, 30, 40, 50, 60, 70],
[15, 25, 35, 45, 55, 65, 75],
[20, 30, 40, 50, 60, 70, 80],
[25, 35, 45, 55, 65, 75, 85],
[30, 40, 50, 60, 70, 80, 90],
[10, 24, 30, 36, 42, 48, 54],
[10, 20, 30, 40, 50, 60, 70],
[4, 8, 12, 16, 20, 24, 28],
[1, 2, 3, 4, 5, 6, 7],
[3, 5, 7, 9, 11, 13, 15],
[1, 4, 7, 10, 13, 16, 19],
[34, 12, 23, 12, 23, 12, 23],
])Note: The week starts from Monday-Sunday
Matrix Math
All element-wise except :
More Matrix Math
Normalization
Multiplying several columns
Row-wise normalization
Outer Product
This all must seem very overwhelming, like a lot to remember!
But you don't have to memorize any of this.
You will get used to this notation as you use it, and can look things up as you go along 💫
Creating Matrices from Arrays
When you have a single, long array and you want to turn it into a matrix:
Use the reshape() method!
a.reshape(2,3)Creating Matrices from Arrays
When you have a single, long array and you want to turn it into a matrix:
Go back with the flatten() method!
a.flatten()Row and Column Vectors
You can use the @ operator to perform dot/outer products on 2 vectors
But how does numpy know if the vector is a row vector or a column vector?
We must encode the 1D vectors as 2D vectors
(3,1) @ (1,3) -> (3x3)
(1,3) @ (3,1) -> (1x1)
Row and Column Vectors
np.array([1, 2, 3, 4])Normal 1D Vector (4,)
4 rows, 0 columns
np.array([[1, 2, 3, 4]])Row 2D Vector (1, 4)
1 row, 4 columns
np.array([[1], [2], [3], [4]])Column 2D Vector (4, 1)
4 rows, 1 column
Notice how this has double brackets, making it a 2D matrix with 1 row!
Row and Column Vectors
np.array([1, 2, 3, 4])Normal 1D Vector (4,)
4 rows, 0 columns
np.array([[1, 2, 3, 4]])Row 2D Vector (1, 4)
1 row, 4 columns
np.array([[1], [2], [3], [4]])Column 2D Vector (4, 1)
4 rows, 1 column
x.Tx.Tx.reshape(1, -1)x.flatten()x.reshape(-1, 1)x.flatten()Challenge #5
Challenge #5
Let's play around with tensors:
One last (useful) thing!
Saving and loading numpy arrays
import numpy as np
a = np.arange(6).reshape(2,3)
np.save("my_array.npy", a) # save to file
b = np.load("my_array.npy") # load back
print(b)
print(f"They are equal: {np.all(a == b)}")Surprise mini-challenge:
Why did I use np.all() in the above code
The End