Householder-based algorithm for the general eigenvalue problem

Luis Manuel Román García

ITAM , 2018

Presentation Overview

  1. Problem description

  2. QZ traps and pitfalls

  3. The Hessenberg reduction step

  4. Conclusion

Problem description

What is a hard problem?

Ax = \lambda x
Ax=λxAx = \lambda x

A quick glance through the contents of the average 60 papers per year in SIMAX shows that roughly 40% of the papers are associated with eigenvalue problem research, and it is likely that this holds more or less for the many papers per year that focus on numerical linear algebra.

A harder problem

Ax = \lambda B x
Ax=λBxAx = \lambda B x

The generalized eigenvalue problem

The generalized eigenvalue problem

Steps:

1.- Reduce B to upper triangular (QR)

2.- Reduce A to Hessenberg form

3.- Reduce A to quasitriangular form

4.- Compute the eigenvalues

5.- Compute the eigenvectors

QZ traps and pitfalls

QZ traps and pitfalls

1.- Infinite eigenvalues

2.- Vanishing entries

3.- Computationally intensive

 

The Hessenberg reduction step

Second order methods

w_{k+1} = w_k - \alpha H_{(|S|_k)} \frac{1}{|X|_k}\displaystyle\sum_{i=1}^{|X|_k}\nabla l(h_{w_k})
wk+1=wkαH(Sk)1Xki=1Xkl(hwk)w_{k+1} = w_k - \alpha H_{(|S|_k)} \frac{1}{|X|_k}\displaystyle\sum_{i=1}^{|X|_k}\nabla l(h_{w_k})
|S|_k\leq |X|_k
SkXk|S|_k\leq |X|_k

Under some regularity assumptions, the best we can expect is super linear - quadratic convergence:

 

Best case scenario

|X_k| \geq |X_0|\eta_k^k;\quad |X_0|\geq\bigg(\frac{6v\gamma M}{\hat{\mu}^2}\bigg), \eta_k > \eta_{k-1}, \eta_k \rightarrow\infty, \eta_1>1
XkX0ηkk;X0(6vγMμ^2),ηk>ηk1,ηk,η1>1|X_k| \geq |X_0|\eta_k^k;\quad |X_0|\geq\bigg(\frac{6v\gamma M}{\hat{\mu}^2}\bigg), \eta_k > \eta_{k-1}, \eta_k \rightarrow\infty, \eta_1>1
|S_k| > |S_{k-1}|;\quad \displaystyle\lim_{k\rightarrow\infty}|S_k|=\infty; \quad |S_0|\geq\bigg(\frac{4\sigma}{\hat{\mu}}\bigg)^2
Sk>Sk1;limkSk=;S0(4σμ^)2|S_k| > |S_{k-1}|;\quad \displaystyle\lim_{k\rightarrow\infty}|S_k|=\infty; \quad |S_0|\geq\bigg(\frac{4\sigma}{\hat{\mu}}\bigg)^2
\|w_0-w^*\|\leq\frac{\hat{\mu}}{3\gamma M}
w0wμ^3γM \|w_0-w^*\|\leq\frac{\hat{\mu}}{3\gamma M}
\mathbb{E}[|w_k - w^*|]\leq\tau_k\quad\quad\displaystyle\lim_{k\rightarrow\infty}\frac{\tau_{k+1}}{\tau_k}\rightarrow 0
E[wkw]τklimkτk+1τk0 \mathbb{E}[|w_k - w^*|]\leq\tau_k\quad\quad\displaystyle\lim_{k\rightarrow\infty}\frac{\tau_{k+1}}{\tau_k}\rightarrow 0

In an online scenario regret grows O(log(T)):

 

Best case scenario

\gamma = \frac{1}{2}\min\{\frac{1}{4GD}, \alpha\}, \epsilon = \frac{1}{\gamma^2 D^2}
γ=12min{14GD,α},ϵ=1γ2D2\gamma = \frac{1}{2}\min\{\frac{1}{4GD}, \alpha\}, \epsilon = \frac{1}{\gamma^2 D^2}
regret_T \leq 5(\frac{1}{\alpha}+ GD)n\log(T)
regretT5(1α+GD)nlog(T)regret_T \leq 5(\frac{1}{\alpha}+ GD)n\log(T)

The multiple advantages of second order methods:

 

  • Faster convergence rate

  • Embarrassingly parallel

  • Take into consideration curvature information

  • Ideal for highly varying functions

  • Higher per iteration cost

2010-2016

  • Martens is the first to successfully train a deep convolutional neural network with L-BFGS.

  • Sutskever successfully trains a recurrent neural network with a generalized Gauss-Newton algorithm

  • Bengio achieves state of the art results training recurrent networks with second order methods

Conclusion

Made with Slides.com