Huang Fang
June 22, 2024
For
Different coordinate selection rules:
Forty-Two Open Problems in the Mathematics of Data Science
regularizer
Iter 1
Iter 2
Iter 3
Iter 4
Iter
data fitting
[FFSF, AISTATS'20]
We provide a theoretical characterization of GCD's screening ability:
for
Learning a sparse representation of an atomic set :
such that
Our contribution: how to identify the atoms with nonzero coefficients at solution during the optimization process.
[FFF, OJMO'24]
Play a game for rounds, for
The goal of online learning algorithm: obtain sublinear regret
player's loss
competitor's loss
Our contribution: fix the divergence issue of MD and obtain
regret.
[FHPF, ICML'20, JMLR'22]
Primal
Dual
Bregman projection
Figure accredited to Victor Portella.
[FHPF, ICML'20, JMLR'22]
Primal
Dual
Bregman projection
}
With stabilization, OMD can obtain regret.
smooth
nonsmooth
Some discrepancies between theory and practice:
Two important structures:
where is a nonnegative, , convex, 1-smooth loss function, 's are Lipschitz continuous.
[FFF, ICLR'21]
square loss, L2-hinge loss, logistic loss, etc.
absolute loss, L1-hinge loss.
With constant learning rate, we prove
[FFF, ICLR'21]
Two follow up questions:
[FFF, ICLR'21]
The answer to above questions is "no".
We derive lower bounds for iteration complexity:
1.0
0.4
where is the number of time that the i-th rule is satisfied.
[FLCS, WWW'23]
~10,000 speed up!
[FLCS, WWW'23]
[FLCS, WWW'23]
[CCFHS, NeurIPS'23]
DiffLogic: combining (soft) MLN with knowledge graph embedding.
LLM pretraining is essentially an optimization problem! Lower loss means better performance.
Figure comes from "Llama 2: Open Foundation and Fine-Tuned Chat Models"
The training curve of a 546b model from META.
Figure comes from "A Theory on Adam Instability in Large-Scale Machine Learning"
training
inference
Figure comes from "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale"
for all PSD matrix
The matrix AMGM inequality conjecture is false [LL20, S20])
smooth | nonsmooth | smooth+IC | nonsmooth + IC | |
---|---|---|---|---|
convex | ||||
strong cvx |