
Fast and Flexible Algorithms for trend filtering

Mathematics on Intelligent Systems- 4


Anirudh Edpuganti                     -         CB.EN.U4AIE20005

Jyothis Viruthi Santosh               -         CB.EN.U4AIE20025

Onteddu Chaitanya Reddy        -         CB.EN.U4AIE20045
Pillalamarri Akshaya                   -         CB.EN.U4AIE20049

Pingali Sathvika                           -         CB.EN.U4AIE20050


Fast and Flexible Algorithms for trend filtering




  • Trend Filtering


  • Trend Filtering
  • What is D


  • Standard ADMM
  • Trend Filtering
  • What is D


  • Standard ADMM
  • Trend Filtering
  • What is D
  • Specialized ADMM


  • Standard ADMM
  • Trend Filtering
  • What is D
  • Specialized ADMM
  • Whyyy ???


Trend Filtering

Trend Filtering

Faces a trade-off between 2 objectives

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness


Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness


Output points

y = (y_1,y_2,...,y_n) \in \mathbb{R}^n

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness


Input points

x = (x_1,x_2,...,x_n) \in \mathbb{R}^n

Evenly spaced

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness


Trend Filter estimate

\hat\beta = (\hat\beta_1,\hat\beta_2,...,\hat\beta_n) \in \mathbb{R}^n

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness


Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness


Discrete Difference Operator

D^{k+1} \in \mathbb{R}^{(n-k-1) \times n}
k \geq 0

Trend Filtering

Faces a trade-off between 2 objectives

Minimizing the residual noise

Maximizing the smootheness


Captures the smoothess between every set of k+2 points

Trend Filtering

Minimizing the residual noise

Maximizing the smootheness


Hold Parallely

Trend Filtering

Minimizing the residual noise

Maximizing the smootheness


Trend Filtering

Minimizing the residual noise

Maximizing the smootheness


Captures the trade-off between the 2 objectives

Trend Filtering

Minimizing the residual noise

Maximizing the smootheness


Captures the trade-off between the 2 objectives

Regularisation parameter

Trend Filtering

So, our trend filtering estimate becomes

\beta \in \mathbb{R}^n
\hat\beta =


Discrete Difference Operator (D)

Discrete Difference Operator (D)

Constant-Order Trend Filtering

Discrete Difference Operator (D)

Constant-Order Trend Filtering

Discrete Difference Operator (D)

Constant-Order Trend Filtering

y = (y_1,y_2,...,y_n) \in \mathbb{R}^n


Discrete Difference Operator (D)

Constant-Order Trend Filtering

y = (y_1,y_2,...,y_n) \in \mathbb{R}^n


1D fused lasso problem

\frac{1}{2}\Sigma^n_{i=1}(y_i - \beta_i)^2 + \lambda \, \Sigma_{i=1}^{n-1}|\beta_i - \beta_{i+1}|
\beta \in \mathbb{R}^n

Discrete Difference Operator (D)

Linear Trend Filtering

Discrete Difference Operator (D)

Linear Trend Filtering

Discrete Difference Operator (D)

Linear Trend Filtering

linear trend filtering problem

\frac{1}{2}\Sigma^n_{i=1}(y_i - \beta_i)^2 + \lambda \, \Sigma_{i=1}^{n-2}|\beta_i - 2\beta_{i+1} + \beta_{i+2}|
\beta \in \mathbb{R}^n

Discrete Difference Operator (D)

Simpler and Better

Discrete Difference Operator (D)

\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^1 \beta||}_1
\beta \in \mathbb{R}^n
\text{1D fused lasso can be re-written as follows}

Discrete Difference Operator (D)

\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^1 \beta||}_1
\beta \in \mathbb{R}^n
D^1 = \begin{bmatrix} -1 & 1 & 0 & . & . & 0 & 0\\ 0 & -1 & 1 & . & . & 0 & 0\\ & & & . & . & & \\ 0 & 0 & 0 & . & . & -1 & 1\\ \end{bmatrix}
\in \mathbb{R}^{(n-1) \times n}


\text{1D fused lasso can be re-written as follows}

Discrete Difference Operator (D)

\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^1 \beta||}_1
\beta \in \mathbb{R}^n
\text{Replacing } D^1 \text{ with } D^2 \text{ makes it a linear trend filtering problem}
\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^2 \beta||}_1
\beta \in \mathbb{R}^n
\text{1D fused lasso can be re-written as follows}

Discrete Difference Operator (D)

\text{Replacing } D^1 \text{ with } D^2 \text{ makes it a linear trend filtering problem}
\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^2 \beta||}_1
\beta \in \mathbb{R}^n
D^2 = \begin{bmatrix} -1 & 2 & -1 & . & .& 0 & 0 & 0\\ 0 & -1 & 2 & . & .& 0 & 0 & 0\\ & & & . & . & & & \\ 0 & 0 & 0 & . & . & -1& 2 & -1\\ \end{bmatrix}
\in \mathbb{R}^{(n-2) \times n}


Discrete Difference Operator (D)

\text{Penalty term for polynomial trend filtering of order k is } ||D^{k+1}\beta||_1
\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||D^{k+1} \beta||}_1
\beta \in \mathbb{R}^n
D^{k+1} = D^1D^k
\in \mathbb{R}^{(n-k-1) \times n}
\text{Recursive relation}

Discrete Difference Operator (D)

Understanding the dimensions

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in \mathbb{R}^{(n-k-1) \times n}

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in \mathbb{R}^{(n-k-1) \times n}

Let k=1

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in \mathbb{R}^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-1) \times n
(n-1) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-2) \times (n-1)
(n-1) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-2) \times (n-1)
(n-1) \times n
(n-2) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-2) \times (n-1)
(n-1) \times n
(n-2) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

For k

D^{k+1} = D^1D^k
(n-k-1) \times (n-k)
(n-k) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\times n


ADMM Algorithm

ADMM Algorithm

Our problem

\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 \\ \\ s.t \, \, \, \alpha = D^{(k+1)}\beta
\beta \in \mathbb{R}^n , \alpha \in \mathbb{R}^{n-k-1}

ADMM Algorithm

Our problem

\frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 \\ \\ s.t \, \, \, \alpha = D^{(k+1)}\beta

Augumented Lagrangian

L_{\rho}(\beta , \alpha , z) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + z^T(\alpha - D^{k+1}\beta) + \frac{\rho}{2}||\alpha - D^{k+1}\beta||_2^2
\beta \in \mathbb{R}^n , \alpha \in \mathbb{R}^{n-k-1}

ADMM Algorithm

Augumented Lagrangian

L_{\rho}(\beta , \alpha , z) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + z^T(\alpha - D^{k+1}\beta) + \frac{\rho}{2}||\alpha - D^{k+1}\beta||_2^2

Using the scaled dual variable

u = (\frac{1}{\rho})z

Also, let

r = \alpha - D^{k+1} \beta

ADMM Algorithm

Augumented Lagrangian

L_{\rho}(\beta , \alpha , z) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + z^T(r) + \frac{\rho}{2}||r||_2^2

ADMM Algorithm

Augumented Lagrangian

L_{\rho}(\beta , \alpha , z) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + z^T(r) + \frac{\rho}{2}||r||_2^2

ADMM Algorithm

Augumented Lagrangian

z^T(r) + \frac{\rho}{2}||r||_2^2


ADMM Algorithm

Augumented Lagrangian

z^T(r) + \frac{\rho}{2}||r||_2^2 = \frac{\rho}{2}[\frac{2}{\rho}z^Tr + r^Tr + \frac{1}{\rho^2}z^Tz - \frac{1}{\rho^2}z^Tz]
= \frac{\rho}{2}[2u^Tr + r^Tr + u^Tu - u^Tu]
= \frac{\rho}{2}[2u^Tr + r^Tr + u^Tu] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[u^Tr + u^Tr + r^Tr + u^Tu] - \frac{\rho}{2}{||u||}_2^2

ADMM Algorithm

Augumented Lagrangian

= \frac{\rho}{2}[u^Tr + u^Tr + r^Tr + u^Tu] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[u^T(r+u) + (u^T + r^T)r] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[u^T(r+u) + (u + r)^Tr] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[u^T(r+u) + r^T(u + r)] - \frac{\rho}{2}{||u||}_2^2

ADMM Algorithm

Augumented Lagrangian

= \frac{\rho}{2}[u^T(r+u) + r^T(u + r)] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[(r+u) (u^T + r^T)] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}[(r+u) (u + r)^T] - \frac{\rho}{2}{||u||}_2^2
= \frac{\rho}{2}{||r+u||}_2^2 - \frac{\rho}{2}{||u||}_2^2

ADMM Algorithm

Augumented Lagrangian

= \frac{\rho}{2}{||r+u||}_2^2 - \frac{\rho}{2}{||u||}_2^2

Backsubstituting everything

= \frac{\rho}{2}{||\alpha - D^{k+1} \beta +u||}_2^2 - \frac{\rho}{2}{||u||}_2^2

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

L_{\rho}(\beta , \alpha , u) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + \frac{\rho}{2}{||\alpha - D^{k+1} \beta +u||}_2^2 - \frac{\rho}{2}{||u||}_2^2

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

L_{\rho}(\beta , \alpha , u) = \frac{1}{2}{||(y - \beta)||}_2^2 + \lambda \,{||\alpha||}_1 + \frac{\rho}{2}{||\alpha - D^{k+1} \beta +u||}_2^2 - \frac{\rho}{2}{||u||}_2^2
\text{Updates for $\alpha$ , $\beta$ , u}

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

\text{Updates for $\alpha$ , $\beta$ , u}
\beta \leftarrow \frac{y+\rho(D^{(k+1)})^T(\alpha + u)}{I + \rho(D^{(k+1)})^TD^{(k+1)}}
\alpha \leftarrow S_{\frac{\lambda}{\rho}}(D^{(k+1)}\beta - u)
u \leftarrow u + \alpha - D^{(k+1)} \beta

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

\text{Derivation for $\beta$}
\beta \leftarrow \frac{y+\rho(D^{(k+1)})^T(\alpha + u)}{I + \rho(D^{(k+1)})^TD^{(k+1)}}

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

\beta \leftarrow \frac{y+\rho(D^{(k+1)})^T(\alpha + u)}{I + \rho(D^{(k+1)})^TD^{(k+1)}}
\beta \leftarrow \operatorname{argmin} \frac{1}{2}\|y-\beta\|_{2}^{2}+\frac{\rho}{2}\left\|-D^{k+1} \beta+\alpha+u\right\|_{2}^{2} \\ \\ \\
\text{Omitting the constant terms}
\text{Now on differentiating and equating to 0} \\ \\

ADMM Algorithm

\frac{1}{2} \times 2 \times-I(y-\beta)+\rho\left(-D^{k+1} \beta+\alpha+u\right)\left(-D^{k+1}\right)^{\top}=0 \\ \\
-y+I \beta+\rho\left(D^{k+1}\right)^{\top}\left(D^{k+1}\right) \beta=\rho\left(D^{k+1}\right)^{\top}(\alpha+u) \\
I \beta+\rho\left(D^{k+1}\right)^{\top}\left(D^{k+1}\right) \beta=y+\rho\left(D^{k+1}\right)^{\top}(\alpha+u) \\
\beta \leftarrow \operatorname{argmin} \frac{1}{2}\|y-\beta\|_{2}^{2}+\frac{\rho}{2}\left\|-D^{k+1} \beta+\alpha+u\right\|_{2}^{2} \\ \\ \\

ADMM Algorithm

\beta\left(I+\rho\left(D^{k+1}\right)^{\top}\left(D^{k+1}\right)\right) = y+\rho\left(D^{k+1}\right)^{\top}(\alpha+U) \\
\beta=\frac{y+\rho\left(D^{k+1}\right)^{\top}(\alpha+4)}{I+\rho\left(D^{k+1}\right)^{\top}\left(D^{k+1}\right)} \\
\beta \leftarrow \left(I+\rho\left(D^{k+1}\right)^{\top}\left(D^{k+1}\right)\right)^{-1}\left(y+\rho\left(D^{k+1}\right)^{\top}(\alpha+u)\right) \\

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

\text{Derivation for $\alpha$ }
\alpha \leftarrow S_{\frac{\lambda}{\rho}}(D^{(k+1)}\beta - u)

ADMM Algorithm

\alpha \leftarrow S_{\frac{\lambda}{\rho}}(D^{(k+1)}\beta - u)
\operatorname{argmin} \lambda \,{||\alpha||}_1 + \frac{\rho}{2}{||\alpha - D^{k+1} \beta +u||}_2^2

This can be rewritten as

\operatorname{argmin} \,{||\alpha||}_1 + \frac{1}{2(\frac{\lambda}{\rho})}{||\alpha - (D^{k+1} \beta -u)||}_2^2

ADMM Algorithm

This can be rewritten as

S_{\frac{\lambda}{\rho}}(D^{(k+1)}\beta - u)
\operatorname{argmin} \,{||\alpha||}_1 + \frac{1}{2(\frac{\lambda}{\rho})}{||\alpha - (D^{k+1} \beta -u)||}_2^2

ADMM Algorithm

Augumented Lagrangian with

scaled dual variable

u \leftarrow u + \alpha - D^{(k+1)} \beta
\text{Derivation for} \, u

Specialized ADMM

Specialized ADMM

\frac{1}{2}\|y-\beta\|_{2}^{2}+\lambda\left\|D^{(1)} \alpha\right\|_{1} \\ \\ s.t \quad \alpha=D^{(k)} \beta
\beta \in \mathbb{R}^{n}, \alpha \in \mathbb{R}^{n-k}



Specialized ADMM

\frac{1}{2}\|y-\beta\|_{2}^{2}+\lambda\left\|D^{(1)} \alpha\right\|_{1} \\ \\ s.t \quad \alpha=D^{(k)} \beta
\beta \in \mathbb{R}^{n}, \alpha \in \mathbb{R}^{n-k}



Augumented Lagrangian

L(\beta, \alpha, u)=\frac{1}{2}\|y-\beta\|_{2}^{2}+\lambda\left\|D^{(1)} \alpha\right\|_{1}+\frac{\rho}{2}\left\|\alpha-D^{(k)} \beta+u\right\|_{2}^{2}-\frac{\rho}{2}{||u||}_{2}^{2}

Specialized ADMM


Augumented Lagrangian

L(\beta, \alpha, u)=\frac{1}{2}\|y-\beta\|_{2}^{2}+\lambda\left\|D^{(1)} \alpha\right\|_{1}+\frac{\rho}{2}\left\|\alpha-D^{(k)} \beta+u\right\|_{2}^{2}-\frac{\rho}{2}{||u||}_{2}^{2}
\text{Updates for $\alpha$ , $\beta$ , u}

Specialized ADMM


Augumented Lagrangian

L(\beta, \alpha, u)=\frac{1}{2}\|y-\beta\|_{2}^{2}+\lambda\left\|D^{(1)} \alpha\right\|_{1}+\frac{\rho}{2}\left\|\alpha-D^{(k)} \beta+u\right\|_{2}^{2}-\frac{\rho}{2}{||u||}_{2}^{2}
\text{Updates for $\alpha$ , $\beta$ , u}
\beta \leftarrow \frac{y+\rho(D^{(k)})^T(\alpha + u)}{I + \rho(D^{(k)})^TD^{(k)}}
\alpha \leftarrow \underset{\alpha \in \mathbb{R}^{n-k}}{\operatorname{argmin}} \frac{1}{2}\left\|D^{(k)} \beta-u-\alpha\right\|_{2}^{2}+\lambda / \rho\left\|D^{(1)} \alpha\right\|_{1}
u \leftarrow u + \alpha - D^{(k)} \beta

Specialized ADMM


\text{Updates for $\alpha$ , $\beta$ , u}
\beta \leftarrow \frac{y+\rho(D^{(k)})^T(\alpha + u)}{I + \rho(D^{(k)})^TD^{(k)}}
\alpha \leftarrow DP_{\frac{\lambda}{\rho}} (D^{(k)} \beta-u)
u \leftarrow u + \alpha - D^{(k)} \beta


Why Lasso over Ridge ?

Why Scalable form ?

Why Specialized over Standard ?

Why not (n-1) ?

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-1) \times n
(n-1) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-2) \times (n-1)
(n-1) \times n

Discrete Difference Operator (D)

Understanding the dimensions

D^{k+1} = D^1D^k
\in R^{(n-k-1) \times n}

Let k=1

D^{2} = D^1D^1
(n-1) \times (n-1)
(n-1) \times n

Why k+2 points ?

Why k+2 points ?

[f(3) - f(2)] - [f(2) -f(1)]
f(1) - 2f(2) + f(3)


Thank you Mam


By Incredeble us


  • 38