Presentations
Templates
Features
Teams
Pricing
Log in
Sign up

Log in
Sign up

Optimization

A very very short introduction

It all starts with a function

z = f(x, y)

z = f(x, y)

What (x,y) values will minimize my function ?

f(\bm{x})

f(\bm{x})

What can you tell me about your function ?

Convexity

Convexity guarantees a single global minimum

Differentiability

f^\prime(a) = \lim\limits_{h \rightarrow 0 } \frac{ f(a +h) - f(a) } {h}

f^\prime(a) = \lim\limits_{h \rightarrow 0 } \frac{ f(a +h) - f(a) } {h}

The IDEA BEHIND Gradient Descent

Follow the slope !

\nabla f = \left[\begin{matrix} \frac{ \partial f}{ \partial x} \\ \frac{ \partial f}{ \partial y} \end{matrix}\right]

\nabla f = \left[\begin{matrix} \frac{ \partial f}{ \partial x} \\ \frac{ \partial f}{ \partial y} \end{matrix}\right]

Gradient descent algorithm

Start from some point x

Compute the direction of the gradient
Take a step in that direction
Go back to 1

Here is another nice animation

\bm{x_{n+1}} = \bm{x_{n}} - { \mu} \nabla f( \bm{x_{n}} )

\bm{x_{n+1}} = \bm{x_{n}} - { \mu} \nabla f( \bm{x_{n}} )

How big of a step should I take?

\bm{x_{n+1}} = \bm{x_{n}} - {\color{red} \mu} \nabla f( \bm{x_{n}} )

\bm{x_{n+1}} = \bm{x_{n}} - {\color{red} \mu} \nabla f( \bm{x_{n}} )

Fixed step:
- If $\nabla f$ is L-lipschitz convergence is guaranteed for $\mu < 1/L$
Line search: $\min\limits_{\mu > 0} f( x - \mu \nabla f(x))$

Let's have a look at scipy.optimize

Newton's method

Newton and quasi-newton

$x_{n+1} = x_{n} - [ \nabla^2 f(x_n) ]^{-1} \nabla f (x_n)$

Newton's update:

\nabla^2 f = \left[\begin{matrix} \frac{ \partial^2 f}{ \partial x \partial x} & \frac{\partial^2 f}{ \partial x \partial y} \\ \frac{ \partial^2 f}{ \partial y \partial x} & \frac{ \partial^2 f}{ \partial y \partial y} \end{matrix}\right]

\nabla^2 f = \left[\begin{matrix} \frac{ \partial^2 f}{ \partial x \partial x} & \frac{\partial^2 f}{ \partial x \partial y} \\ \frac{ \partial^2 f}{ \partial y \partial x} & \frac{ \partial^2 f}{ \partial y \partial y} \end{matrix}\right]

Computing and inverting the Hessian can be very costly, quasi-Newton work around it

Why is this useful ?

Deep Neural Networks

Example loss function for regression:
$L = \parallel y - f_{w}(x) \parallel^2$

$L = \sum\limits_{i=1}^{N} (y_i - f_w(x_i))^2$

Stochastic Gradient Descent

It's not all bad

inverse Problems

Deconvolution

Inpainting

Denoising

$y = \bm{A} x + n$
A is non-invertible or ill-conditioned

Regularization

$L = \parallel y - \bm{A} x \parallel^2 + R(x)$

$R(x) = \lambda \parallel x \parallel_2$

Checkout the numerical tours

Another example

$L = \parallel y - \bm{A} x \parallel^2 + R(x)$

$R(x) = \lambda \parallel \Phi x \parallel_1$

what i haven't talked about

Constrained optimization
Simulated annealing
NP-Hard problems
...

Optimization A very very short introduction

Optimization

By eiffl

Optimization

Practical statistics series

7 years ago
979

eiffl PRO

More from eiffl

Foundation Models for Astrophysics

eiffl

106
One Network to Rule Them All

eiffl

168
Towards Multidisciplinary Scientific Foundation Models

eiffl

195
A New Era of Multi-Modal Self-Supervised Learning for Astrophysics

eiffl

160

Tour

Presentations Trending decks Templates Features Pricing Slides for Teams Slides for Developers

Help

Forum Knowledge Base Developers Docs Leave Feedback Report an Issue

Company

News Changelog About Slides Security Partners

Resources

Make slides with AI Embed Google Maps Embed Google Forms Embed YouTube Convert PDF to Slides Convert PPT to Slides Convert Markdown to Slides

Terms • Privacy • © 2025 Slides, Inc.

BESbswyBESbswyBESbswyBESbswy