Shen Shen
Feb 16, 2024
(many slides adapted from Tamara Broderick)
Optimization primer
Gradient, optimality, convexity
GD as an optimization algorithm for generic function
GD as an optimization algorithm for ML applications
Loss function typically a finite sum
Stochastic gradient descent (SGD) for ML applications
Pick one out of the finite sum
Optimization primer
Gradient, optimality, convexity
GD as an optimization algorithm for generic function
GD as an optimization algorithm for ML applications
Loss function typically a finite sum
Stochastic gradient descent (SGD) for ML applications
Pick one out of the finite sum
e.g.
another example
5 cases:
When minimizing a function, we'd hope to get a global min
Simple examples
Convex functions
Non-convex functions
What do we need to know:
Optimization primer
Gradient, optimality, convexity
GD as an optimization algorithm for generic function
GD as an optimization algorithm for ML applications
Loss function typically a finite sum (over data)
Stochastic gradient descent (SGD) for ML applications
Pick one data out of the finite sum
hyperparameters
if violated:
can't run gradient descent
if violated:
e.g. get stuck at a saddle point
if violated:
e.g. may not terminate
if violated:
see demo, and lab
Recall: need step-size sufficiently small
run long enough
Optimization primer
Gradient, optimality, convexity
GD as an optimization algorithm for generic function
GD as an optimization algorithm for ML applications
Loss function typically a finite sum
Stochastic gradient descent (SGD) for ML applications
Pick one out of the finite sum
Optimization primer
Gradient, optimality, convexity
GD as an optimization algorithm for generic function
GD as an optimization algorithm for ML applications
Loss function typically a finite sum
Stochastic gradient descent (SGD) for ML applications
Pick one out of the finite sum
Optimization primer
Gradient, optimality, convexity
GD as an optimization algorithm for generic function
GD as an optimization algorithm for ML applications
Loss function typically a finite sum
Stochastic gradient descent (SGD) for ML applications
Pick one out of the finite sum
for a randomly picked \(i\)
More "random"
More "demanding"
We'd love it for you to share some lecture feedback.