21MAT212

Fast and Flexible Algorithms for trend filtering

Mathematics on Intelligent Systems- 4

21MAT212

Anirudh Edpuganti                     -         CB.EN.U4AIE20005

Jyothis Viruthi Santosh               -         CB.EN.U4AIE20025

Onteddu Chaitanya Reddy        -         CB.EN.U4AIE20045
Pillalamarri Akshaya                   -         CB.EN.U4AIE20049

Pingali Sathvika                           -         CB.EN.U4AIE20050

Team-2

Fast and Flexible Algorithms for trend filtering

21MAT212

Contents

Contents

  • Trend Filtering

Contents

  • Trend Filtering
  • What is D

Contents

  • Standard ADMM
  • Trend Filtering
  • What is D

Contents

  • Standard ADMM
  • Trend Filtering
  • What is D
  • Specialized ADMM

Contents

  • Standard ADMM
  • Trend Filtering
  • What is D
  • Specialized ADMM
  • Whyyy ???

21MAT212

Trend Filtering

Trend Filtering

Faces a trade-off between 2 objectives

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness

\frac{1}{2}||y-\beta||^2_2

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness

\frac{1}{2}||y-\beta||^2_2

Output points

y = (y_1,y_2,...,y_n) \in \mathbb{R}^n

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness

\frac{1}{2}||y-\beta||^2_2

Input points

x = (x_1,x_2,...,x_n) \in \mathbb{R}^n

Evenly spaced

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness

\frac{1}{2}||y-\beta||^2_2

Trend Filter estimate

\hat\beta = (\hat\beta_1,\hat\beta_2,...,\hat\beta_n) \in \mathbb{R}^n

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness

\frac{1}{2}||y-\beta||^2_2
{||D^{k+1}\beta||}_1

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness

\frac{1}{2}||y-\beta||^2_2
{||D^{k+1}\beta||}_1

Discrete Difference Operator

D^{k+1} \in \mathbb{R}^{(n-k-1) \times n}
k \geq 0

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness

\frac{1}{2}||y-\beta||^2_2
{||D^{k+1}\beta||}_1

Captures the smoothess between every set of k+2 points

Trend Filtering

Minimizing the residual noise

Maximizing the smootheness

\frac{1}{2}||y-\beta||^2_2
{||D^{k+1}\beta||}_1

Hold Parallely

Trend Filtering

Minimizing the residual noise

Maximizing the smootheness

\frac{1}{2}||y-\beta||^2_2
{||D^{k+1}\beta||}_1
\lambda

Trend Filtering

Minimizing the residual noise

Maximizing the smootheness

\frac{1}{2}||y-\beta||^2_2
{||D^{k+1}\beta||}_1
\lambda

Captures the trade-off between the 2 objectives

Trend Filtering

Minimizing the residual noise

Maximizing the smootheness

\frac{1}{2}||y-\beta||^2_2
{||D^{k+1}\beta||}_1
\lambda

Captures the trade-off between the 2 objectives

Regularisation parameter

Trend Filtering

So, our trend filtering estimate becomes

\frac{1}{2}||y-\beta||^2_2
{||D^{k+1}\beta||}_1
\lambda
+
argmin
\beta \in \mathbb{R}^n
\hat\beta =

21MAT212

Discrete Difference Operator (D)

Discrete Difference Operator (D)

Constant-Order Trend Filtering

Discrete Difference Operator (D)

Constant-Order Trend Filtering

Discrete Difference Operator (D)

Constant-Order Trend Filtering

y = (y_1,y_2,...,y_n) \in \mathbb{R}^n

Observations

Discrete Difference Operator (D)

Constant-Order Trend Filtering

y = (y_1,y_2,...,y_n) \in \mathbb{R}^n

Observations

1D fused lasso problem

\frac{1}{2}\Sigma^n_{i=1}(y_i - \beta_i)^2 + \lambda \, \Sigma_{i=1}^{n-1}|\beta_i - \beta_{i+1}|
min
\beta \in \mathbb{R}^n

Discrete Difference Operator (D)

Linear Trend Filtering

Discrete Difference Operator (D)

Linear Trend Filtering

Discrete Difference Operator (D)

Linear Trend Filtering

linear trend filtering problem

\frac{1}{2}\Sigma^n_{i=1}(y_i - \beta_i)^2 + \lambda \, \Sigma_{i=1}^{n-2}|\beta_i - 2\beta_{i+1} + \beta_{i+2}|
min
\beta \in \mathbb{R}^n

Discrete Difference Operator (D)

Simpler and Better

Discrete Difference Operator (D)

\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^1 \beta||}_1
min
\beta \in \mathbb{R}^n
\text{1D fused lasso can be re-written as follows}

Discrete Difference Operator (D)

\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^1 \beta||}_1
min
\beta \in \mathbb{R}^n
D^1 = \begin{bmatrix} -1 & 1 & 0 & . & . & 0 & 0\\ 0 & -1 & 1 & . & . & 0 & 0\\ & & & . & . & & \\ 0 & 0 & 0 & . & . & -1 & 1\\ \end{bmatrix}
\in \mathbb{R}^{(n-1) \times n}

where

\text{1D fused lasso can be re-written as follows}

Discrete Difference Operator (D)

\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^1 \beta||}_1
min
\beta \in \mathbb{R}^n
\text{Replacing } D^1 \text{ with } D^2 \text{ makes it a linear trend filtering problem}
\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^2 \beta||}_1
min
\beta \in \mathbb{R}^n
\text{1D fused lasso can be re-written as follows}

Discrete Difference Operator (D)

\text{Replacing } D^1 \text{ with } D^2 \text{ makes it a linear trend filtering problem}
\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^2 \beta||}_1
min
\beta \in \mathbb{R}^n
D^2 = \begin{bmatrix} -1 & 2 & -1 & . & .& 0 & 0 & 0\\ 0 & -1 & 2 & . & .& 0 & 0 & 0\\ & & & . & . & & & \\ 0 & 0 & 0 & . & . & -1& 2 & -1\\ \end{bmatrix}
\in \mathbb{R}^{(n-2) \times n}

where

Discrete Difference Operator (D)

\text{Penalty term for polynomial trend filtering of order k is } ||D^{k+1}\beta||_1
\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^{k+1} \beta||}_1
min
\beta \in \mathbb{R}^n
D^{k+1} = D^1D^k
\in \mathbb{R}^{(n-k-1) \times n}
\text{Recursive relation}

Discrete Difference Operator (D)

Understanding the dimensions

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in \mathbb{R}^{(n-k-1) \times n}

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in \mathbb{R}^{(n-k-1) \times n}

Let k=1

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in \mathbb{R}^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-1) \times n
(n-1) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-2) \times (n-1)
(n-1) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-2) \times (n-1)
(n-1) \times n
(n-2) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-2) \times (n-1)
(n-1) \times n
(n-2) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

For k

D^{k+1} = D^1D^k
(n-k-1) \times (n-k)
(n-k) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
(n-k-1)
\times n

21MAT212

ADMM Algorithm

ADMM Algorithm

Our problem

\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 \\ \\ s.t \, \, \, \alpha = D^{(k+1)}\beta
min
\beta \in \mathbb{R}^n , \alpha \in \mathbb{R}^{n-k-1}

ADMM Algorithm

Our problem

\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 \\ \\ s.t \, \, \, \alpha = D^{(k+1)}\beta
min

Augumented Lagrangian

L_{\rho}(\beta , \alpha , z) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + z^T(\alpha - D^{k+1}\beta) + \frac{\rho}{2}||\alpha - D^{k+1}\beta||_2^2
\beta \in \mathbb{R}^n , \alpha \in \mathbb{R}^{n-k-1}

ADMM Algorithm

Augumented Lagrangian

L_{\rho}(\beta , \alpha , z) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + z^T(\alpha - D^{k+1}\beta) + \frac{\rho}{2}||\alpha - D^{k+1}\beta||_2^2

Using the scaled dual variable

u = (\frac{1}{\rho})z

Also, let

r = \alpha - D^{k+1} \beta

ADMM Algorithm

Augumented Lagrangian

L_{\rho}(\beta , \alpha , z) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + z^T(r) + \frac{\rho}{2}||r||_2^2

ADMM Algorithm

Augumented Lagrangian

L_{\rho}(\beta , \alpha , z) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + z^T(r) + \frac{\rho}{2}||r||_2^2

ADMM Algorithm

Augumented Lagrangian

z^T(r) + \frac{\rho}{2}||r||_2^2

Manipulation

ADMM Algorithm

Augumented Lagrangian

z^T(r) + \frac{\rho}{2}||r||_2^2 = \frac{\rho}{2}[\frac{2}{\rho}z^Tr + r^Tr + \frac{1}{\rho^2}z^Tz - \frac{1}{\rho^2}z^Tz]
= \frac{\rho}{2}[2u^Tr + r^Tr + u^Tu - u^Tu]
= \frac{\rho}{2}[2u^Tr + r^Tr + u^Tu] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[u^Tr + u^Tr + r^Tr + u^Tu] - \frac{\rho}{2}{||u||}_2^2

ADMM Algorithm

Augumented Lagrangian

= \frac{\rho}{2}[u^Tr + u^Tr + r^Tr + u^Tu] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[u^T(r+u) + (u^T + r^T)r] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[u^T(r+u) + (u + r)^Tr] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[u^T(r+u) + r^T(u + r)] - \frac{\rho}{2}{||u||}_2^2

ADMM Algorithm

Augumented Lagrangian

= \frac{\rho}{2}[u^T(r+u) + r^T(u + r)] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[(r+u) (u^T + r^T)] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[(r+u) (u + r)^T] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}{||r+u||}_2^2 - \frac{\rho}{2}{||u||}_2^2

ADMM Algorithm

Augumented Lagrangian

= \frac{\rho}{2}{||r+u||}_2^2 - \frac{\rho}{2}{||u||}_2^2

Backsubstituting everything

= \frac{\rho}{2}{||\alpha - D^{k+1} \beta +u||}_2^2 - \frac{\rho}{2}{||u||}_2^2

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

L_{\rho}(\beta , \alpha , u) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + \frac{\rho}{2}{||\alpha - D^{k+1} \beta +u||}_2^2 - \frac{\rho}{2}{||u||}_2^2

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

L_{\rho}(\beta , \alpha , u) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + \frac{\rho}{2}{||\alpha - D^{k+1} \beta +u||}_2^2 - \frac{\rho}{2}{||u||}_2^2
\text{Updates for $\alpha$ , $\beta$ , u}

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

\text{Updates for $\alpha$ , $\beta$ , u}
\beta \leftarrow \frac{y+\rho(D^{(k+1)})^T(\alpha + u)}{I + \rho(D^{(k+1)})^TD^{(k+1)}}
\alpha \leftarrow S_{\frac{\lambda}{\rho}}(D^{(k+1)}\beta - u)
u \leftarrow u + \alpha - D^{(k+1)} \beta

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

\text{Derivation for $\beta$}
\beta \leftarrow \frac{y+\rho(D^{(k+1)})^T(\alpha + u)}{I + \rho(D^{(k+1)})^TD^{(k+1)}}

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

\beta \leftarrow \frac{y+\rho(D^{(k+1)})^T(\alpha + u)}{I + \rho(D^{(k+1)})^TD^{(k+1)}}
\beta \leftarrow \operatorname{argmin} \frac{1}{2}\|y-\beta\|_{2}^{2}+\frac{\rho}{2}\left\|-D^{k+1} \beta+\alpha+u\right\|_{2}^{2} \\ \\ \\
\text{Omitting the constant terms}
\text{Now on differentiating and equating to 0} \\ \\

ADMM Algorithm

\frac{1}{2} \times 2 \times-I(y-\beta)+\rho\left(-D^{k+1} \beta+\alpha+u\right)\left(-D^{k+1}\right)^{\top}=0 \\ \\
-y+I \beta+\rho\left(D^{k+1}\right)^{\top}\left(D^{k+1}\right) \beta=\rho\left(D^{k+1}\right)^{\top}(\alpha+u) \\
I \beta+\rho\left(D^{k+1}\right)^{\top}\left(D^{k+1}\right) \beta=y+\rho\left(D^{k+1}\right)^{\top}(\alpha+u) \\
\beta \leftarrow \operatorname{argmin} \frac{1}{2}\|y-\beta\|_{2}^{2}+\frac{\rho}{2}\left\|-D^{k+1} \beta+\alpha+u\right\|_{2}^{2} \\ \\ \\

ADMM Algorithm

\beta\left(I+\rho\left(D^{k+1}\right)^{\top}\left(D^{k+1}\right)\right) = y+\rho\left(D^{k+1}\right)^{\top}(\alpha+U) \\
\beta=\frac{y+\rho\left(D^{k+1}\right)^{\top}(\alpha+4)}{I+\rho\left(D^{k+1}\right)^{\top}\left(D^{k+1}\right)} \\
\beta \leftarrow \left(I+\rho\left(D^{k+1}\right)^{\top}\left(D^{k+1}\right)\right)^{-1}\left(y+\rho\left(D^{k+1}\right)^{\top}(\alpha+u)\right) \\

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

\text{Derivation for $\alpha$ }
\alpha \leftarrow S_{\frac{\lambda}{\rho}}(D^{(k+1)}\beta - u)

ADMM Algorithm

\alpha \leftarrow S_{\frac{\lambda}{\rho}}(D^{(k+1)}\beta - u)
\operatorname{argmin} \lambda \,{||\alpha||}_1 + \frac{\rho}{2}{||\alpha - D^{k+1} \beta +u||}_2^2

This can be rewritten as

\operatorname{argmin} \,{||\alpha||}_1 + \frac{1}{2(\frac{\lambda}{\rho})}{||\alpha - (D^{k+1} \beta -u)||}_2^2

ADMM Algorithm

This can be rewritten as

S_{\frac{\lambda}{\rho}}(D^{(k+1)}\beta - u)
\operatorname{argmin} \,{||\alpha||}_1 + \frac{1}{2(\frac{\lambda}{\rho})}{||\alpha - (D^{k+1} \beta -u)||}_2^2

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

u \leftarrow u + \alpha - D^{(k+1)} \beta
\text{Derivation for} \, u

Specialized ADMM

Specialized ADMM

\frac{1}{2}\|y-\beta\|_{2}^{2}+\lambda\left\|D^{(1)} \alpha\right\|_{1} \\ \\ s.t \quad \alpha=D^{(k)} \beta
min
\beta \in \mathbb{R}^{n}, \alpha \in \mathbb{R}^{n-k}

Text

Problem

Specialized ADMM

\frac{1}{2}\|y-\beta\|_{2}^{2}+\lambda\left\|D^{(1)} \alpha\right\|_{1} \\ \\ s.t \quad \alpha=D^{(k)} \beta
min
\beta \in \mathbb{R}^{n}, \alpha \in \mathbb{R}^{n-k}

Text

Problem

Augumented Lagrangian

L(\beta, \alpha, u)=\frac{1}{2}\|y-\beta\|_{2}^{2}+\lambda\left\|D^{(1)} \alpha\right\|_{1}+\frac{\rho}{2}\left\|\alpha-D^{(k)} \beta+u\right\|_{2}^{2}-\frac{\rho}{2}{||u||}_{2}^{2}

Specialized ADMM

Text

Augumented Lagrangian

L(\beta, \alpha, u)=\frac{1}{2}\|y-\beta\|_{2}^{2}+\lambda\left\|D^{(1)} \alpha\right\|_{1}+\frac{\rho}{2}\left\|\alpha-D^{(k)} \beta+u\right\|_{2}^{2}-\frac{\rho}{2}{||u||}_{2}^{2}
\text{Updates for $\alpha$ , $\beta$ , u}

Specialized ADMM

Text

Augumented Lagrangian

L(\beta, \alpha, u)=\frac{1}{2}\|y-\beta\|_{2}^{2}+\lambda\left\|D^{(1)} \alpha\right\|_{1}+\frac{\rho}{2}\left\|\alpha-D^{(k)} \beta+u\right\|_{2}^{2}-\frac{\rho}{2}{||u||}_{2}^{2}
\text{Updates for $\alpha$ , $\beta$ , u}
\beta \leftarrow \frac{y+\rho(D^{(k)})^T(\alpha + u)}{I + \rho(D^{(k)})^TD^{(k)}}
\alpha \leftarrow \underset{\alpha \in \mathbb{R}^{n-k}}{\operatorname{argmin}} \frac{1}{2}\left\|D^{(k)} \beta-u-\alpha\right\|_{2}^{2}+\lambda / \rho\left\|D^{(1)} \alpha\right\|_{1}
u \leftarrow u + \alpha - D^{(k)} \beta

Specialized ADMM

Text

\text{Updates for $\alpha$ , $\beta$ , u}
\beta \leftarrow \frac{y+\rho(D^{(k)})^T(\alpha + u)}{I + \rho(D^{(k)})^TD^{(k)}}
\alpha \leftarrow DP_{\frac{\lambda}{\rho}} (D^{(k)} \beta-u)
u \leftarrow u + \alpha - D^{(k)} \beta

WHYYY ???

Why Lasso over Ridge ?

Why Scalable form ?

Why Specialized over Standard ?

Why not (n-1) ?

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-1) \times n
(n-1) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-2) \times (n-1)
(n-1) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-1) \times (n-1)
(n-1) \times n
?

Why k+2 points ?

Why k+2 points ?

[f(3) - f(2)] - [f(2) -f(1)]
f(1) - 2f(2) + f(3)

Computation

Thank you Mam

Made with Slides.com