Intro to Machine Learning
data:image/s3,"s3://crabby-images/176c7/176c746469f89d0898597fd5e5b6ad2ebe8ef0ca" alt=""
data:image/s3,"s3://crabby-images/651e3/651e3e8b658a1bfd61c989e8335ec0810203d560" alt=""
Lecture 3: Gradient Descent Methods
Shen Shen
Feb 16, 2024
(many slides adapted from Tamara Broderick)
Outline
- Recall (Ridge regression) => Why care about GD
-
Optimization primer
-
Gradient, optimality, convexity
-
-
GD as an optimization algorithm for generic function
-
GD as an optimization algorithm for ML applications
-
Loss function typically a finite sum
-
-
Stochastic gradient descent (SGD) for ML applications
-
Pick one out of the finite sum
-
Recall
- A general ML approach
- Collect data
- Choose hypothesis class, hyperparameter, loss function
- Train (optimize for) "good" hypothesis by minimizing loss. e.g. ridge regression
- Great when have analytical solutions
- But don't always have them (recall, half-pipe)
- Even when do have analytical solutions, can be expensive to compute (recall, lab2, Q2.8,)
- Want a more general, efficient way! => GD methods
data:image/s3,"s3://crabby-images/f41fb/f41fba59c9d52891606918f5a719fae920d1b72d" alt=""
data:image/s3,"s3://crabby-images/a9062/a90623f78afe7c3079feac5b166608f59a7a66e0" alt=""
data:image/s3,"s3://crabby-images/e5959/e595967717cf1a195f60cc76f71911efc06da303" alt=""
data:image/s3,"s3://crabby-images/25fe4/25fe4504eaa86be4f85966691879e82a4760e34d" alt=""
data:image/s3,"s3://crabby-images/34651/34651c0c22988ebfcf2ee8a8acbce76a600a29a2" alt=""
data:image/s3,"s3://crabby-images/d6b87/d6b87d8a562baa3fc8e543b49c39f4d03c4abda6" alt=""
data:image/s3,"s3://crabby-images/988d9/988d91d2e44b799316118e3a0607f9e1479a2334" alt=""
Outline
- Recall (Ridge regression) => Why care about GD
-
Optimization primer
-
Gradient, optimality, convexity
-
-
GD as an optimization algorithm for generic function
-
GD as an optimization algorithm for ML applications
-
Loss function typically a finite sum
-
-
Stochastic gradient descent (SGD) for ML applications
-
Pick one out of the finite sum
-
Gradient
- Def: For f:Rm→R, its gradient ∇f:Rm→Rm is defined at the point p=(x1,…,xm) in m-dimensional space as the vector
data:image/s3,"s3://crabby-images/53348/5334810c35d8d0a55ca967118bfc6b3ecc5816c7" alt=""
e.g.
another example
When gradient is zero:
data:image/s3,"s3://crabby-images/43a58/43a589d0dca3afb56c21d6842ff6f205f31b1452" alt=""
5 cases:
data:image/s3,"s3://crabby-images/13302/13302f9c02949f5cd40021cf1bb9fb8ead0f28fd" alt=""
data:image/s3,"s3://crabby-images/db787/db7872b09fa1451fab85ac1c7148f2377417fd82" alt=""
When minimizing a function, we'd hope to get a global min
Convex Functions
- A function f on Rm is convex if any line segment connecting two points of the graph of f lies above or on the graph.
- (f is concave if −f is convex.)
- For convex functions, local minima are all global minima.
Simple examples
Convex functions
Non-convex functions
data:image/s3,"s3://crabby-images/6cce2/6cce2914ac01b7680afb9f62aec436950d3c8e0e" alt=""
data:image/s3,"s3://crabby-images/0ce22/0ce22ac010408128f93bb694d72cb02fa5b7de09" alt=""
data:image/s3,"s3://crabby-images/5b4e2/5b4e2864744ff0bbde36a3ead6475940add7d95d" alt=""
data:image/s3,"s3://crabby-images/e5959/e595967717cf1a195f60cc76f71911efc06da303" alt=""
Convex Functions (cont'd)
What do we need to know:
- Intuitive understanding of the definition
- If given a function, can determine if it's convex or not. (We'll only ever give at most 2D, so visually is enough)
- Understand how (stochastic) gradient descent algorithms would behave differently depending on if convexity is satisfied.
- For this class, OLS loss function is convex, ridge regression loss is (strictly) convex, and later cross-entropy loss function is convex too.
Outline
- Recall (Ridge regression) => Why care about GD
-
Optimization primer
-
Gradient, optimality, convexity
-
-
GD as an optimization algorithm for generic function
-
GD as an optimization algorithm for ML applications
-
Loss function typically a finite sum (over data)
-
-
Stochastic gradient descent (SGD) for ML applications
-
Pick one data out of the finite sum
-
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
hyperparameters
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/9c806/9c806b7f25362f2bdda6291561adb4536e3afa5d" alt=""
data:image/s3,"s3://crabby-images/02303/02303a7072a56309970806a750613fb8ae214184" alt=""
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/e4987/e4987d8696fc56ae6650287335f1d168cfed48cf" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/9c806/9c806b7f25362f2bdda6291561adb4536e3afa5d" alt=""
data:image/s3,"s3://crabby-images/02303/02303a7072a56309970806a750613fb8ae214184" alt=""
data:image/s3,"s3://crabby-images/2ceba/2cebab54918dcf859dc222379abaa1d5fd8d3c76" alt=""
data:image/s3,"s3://crabby-images/3f3ba/3f3ba5870ce92746b71be9efd2001714ad5c986a" alt=""
data:image/s3,"s3://crabby-images/0b853/0b8535936c811a1629a19de3760e5426ce61fe93" alt=""
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/02303/02303a7072a56309970806a750613fb8ae214184" alt=""
data:image/s3,"s3://crabby-images/2ceba/2cebab54918dcf859dc222379abaa1d5fd8d3c76" alt=""
data:image/s3,"s3://crabby-images/3f3ba/3f3ba5870ce92746b71be9efd2001714ad5c986a" alt=""
data:image/s3,"s3://crabby-images/0b853/0b8535936c811a1629a19de3760e5426ce61fe93" alt=""
data:image/s3,"s3://crabby-images/89a2a/89a2a2022550e38d2687a5ad2d7501a3fff4d37f" alt=""
data:image/s3,"s3://crabby-images/9d290/9d2900311c8ce183b92eb78487473c6d2c04f423" alt=""
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/02303/02303a7072a56309970806a750613fb8ae214184" alt=""
data:image/s3,"s3://crabby-images/2ceba/2cebab54918dcf859dc222379abaa1d5fd8d3c76" alt=""
data:image/s3,"s3://crabby-images/3f3ba/3f3ba5870ce92746b71be9efd2001714ad5c986a" alt=""
data:image/s3,"s3://crabby-images/0b853/0b8535936c811a1629a19de3760e5426ce61fe93" alt=""
data:image/s3,"s3://crabby-images/a3488/a34889eb90f120709dc8f7ff4a28d3adb8cc1dfa" alt=""
data:image/s3,"s3://crabby-images/d98b6/d98b61f34581a5a45acb74220439e18eba88c408" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/8c4b9/8c4b98b274019b7bb7c0bc200f748c74dd419e75" alt=""
data:image/s3,"s3://crabby-images/d7f8b/d7f8b909989d978fadcc406027280208bc65c8cf" alt=""
data:image/s3,"s3://crabby-images/02303/02303a7072a56309970806a750613fb8ae214184" alt=""
data:image/s3,"s3://crabby-images/2ceba/2cebab54918dcf859dc222379abaa1d5fd8d3c76" alt=""
data:image/s3,"s3://crabby-images/3f3ba/3f3ba5870ce92746b71be9efd2001714ad5c986a" alt=""
data:image/s3,"s3://crabby-images/0b853/0b8535936c811a1629a19de3760e5426ce61fe93" alt=""
data:image/s3,"s3://crabby-images/fddc5/fddc5a495e5d7400d949b79e0381b238c3609691" alt=""
Gradient descent properties
data:image/s3,"s3://crabby-images/f7b53/f7b53a384223d9e21c1f25ae2722d28b829fedf4" alt=""
if violated:
can't run gradient descent
Gradient descent properties
data:image/s3,"s3://crabby-images/f7b53/f7b53a384223d9e21c1f25ae2722d28b829fedf4" alt=""
if violated:
e.g. get stuck at a saddle point
data:image/s3,"s3://crabby-images/13302/13302f9c02949f5cd40021cf1bb9fb8ead0f28fd" alt=""
Gradient descent properties
data:image/s3,"s3://crabby-images/f7b53/f7b53a384223d9e21c1f25ae2722d28b829fedf4" alt=""
if violated:
e.g. may not terminate
data:image/s3,"s3://crabby-images/cf3a2/cf3a20d2d1bfd77f266ccaf80418857c4eba505f" alt=""
Gradient descent properties
data:image/s3,"s3://crabby-images/f7b53/f7b53a384223d9e21c1f25ae2722d28b829fedf4" alt=""
if violated:
see demo, and lab
Recall: need step-size sufficiently small
run long enough
Outline
- Recall (Ridge regression) => Why care about GD
-
Optimization primer
-
Gradient, optimality, convexity
-
-
GD as an optimization algorithm for generic function
-
GD as an optimization algorithm for ML applications
-
Loss function typically a finite sum
-
-
Stochastic gradient descent (SGD) for ML applications
-
Pick one out of the finite sum
-
Outline
- Recall (Ridge regression) => Why care about GD
-
Optimization primer
-
Gradient, optimality, convexity
-
-
GD as an optimization algorithm for generic function
-
GD as an optimization algorithm for ML applications
-
Loss function typically a finite sum
-
-
Stochastic gradient descent (SGD) for ML applications
-
Pick one out of the finite sum
-
Gradient descent on ML objective
- ML objective functions has typical form: finite sum
data:image/s3,"s3://crabby-images/13d02/13d0292c48dda2d755de2a0cdf19828bbdd71113" alt=""
- For instance, MSE we've seen so far:
- Because (gradient of sum) = (sum of gradient), gradient of an ML objective :
- gradient of that MSE w.r.t. θ:
data:image/s3,"s3://crabby-images/903a6/903a66b452d642152315336a37633793c45f381b" alt=""
data:image/s3,"s3://crabby-images/00186/001861c670f3f3cc0c4407bcc61704822d4cf61c" alt=""
data:image/s3,"s3://crabby-images/96da6/96da64f1bde2c12a97c9ec435a0fac4eb228b52d" alt=""
data:image/s3,"s3://crabby-images/0bd79/0bd796e88e167c669fe11d0d8bf89b75b614d692" alt=""
Outline
- Recall (Ridge regression) => Why care about GD
-
Optimization primer
-
Gradient, optimality, convexity
-
-
GD as an optimization algorithm for generic function
-
GD as an optimization algorithm for ML applications
-
Loss function typically a finite sum
-
-
Stochastic gradient descent (SGD) for ML applications
-
Pick one out of the finite sum
-
Stochastic gradient descent
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
for a randomly picked i
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/9a4fd/9a4fd98e82094acde938ec5f820d055511cf59f9" alt=""
data:image/s3,"s3://crabby-images/f30aa/f30aaf52502d3df69e1f631303a4f13fc67410a6" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/e0b77/e0b775a78a849f4b084eab5dbdad2809085f118e" alt=""
data:image/s3,"s3://crabby-images/19f35/19f3522983e4d26b34cf840f5d5d3ea88d05997e" alt=""
data:image/s3,"s3://crabby-images/19f35/19f3522983e4d26b34cf840f5d5d3ea88d05997e" alt=""
data:image/s3,"s3://crabby-images/05b89/05b899dca47d0323bb89dc7654eaa623dee57d4c" alt=""
More "random"
data:image/s3,"s3://crabby-images/5d6be/5d6be3b09730fd1c56cc67547a4f2822cc027a33" alt=""
More "demanding"
Thanks!
We'd love it for you to share some lecture feedback.
introml-sp24-lec3
By Shen Shen
introml-sp24-lec3
- 145