For function f with f' and f'' exists, TFAE
is convex when
is convex
is convex and so is
is convex
Obviously,
For a differentiable function f, it is -lipschitz if and only if
in 1-D case
is 1-Lipschitz
Bounded by [-1, 1]
is not Lipschitz
Unbounded above!
A function is called -smooth when its derivative is -lipschitz
is 1/4-smooth
Bounded by [-1/4, 1/4]
is 2-smooth
bounded by [-2,2]
When f is non-negative and smooth we can obtain
by setting
in
is -lipschitz then
is -lipschitz
is -smooth then
is -smooth
-lipschitz and -smooth
Since
-smooth
In previous arguement, we have the form like -smooth
But x is a variable, so we also need x to be bounded :
So that we can say a loss function is -smooth
A learning problem with
So linear regression and logistic regression are convex learning problems
When we apply ERM rule to a convex learning problem, we are finding the minimum of convex function
which is equivalent to solving a convex optimization problem
Two kinds of convex learning problems are learnable :
Tikhonov Regularization
Strong convex
Author abuse the fact that the loss function is strong convex in this proof
Lipschitzness
Self-bounded(Smoothness)
same as GD
What if we go out of boundary?
What if we have strong convexity?
We directly minimize the true risk with an unbiased estimate of its gradient
Followed by SGD method we can obtain the result we want
convex-Lipschitz-bounded
convex-Smooth-bounded
| samples / Iterations | RLM | SGD |
|---|---|---|
|
|
||
|
|
Learning
Rule
Specific
Algorithm
When training DNN, SGD is the most common algorithm to train the NN model
In practice SGD is a successful method in training DNN