A Novel Low-Rank Hypergraph

Feature Selection Method

Why Dimension Reduction?

Huge datasets
More informative data using Multi-View technique
Curse of Dimensionality
Multi-View data often overfits!
Large dimensions often have small cores

Dimension Reduction

Subspace Learning
- PCA
- LPP
Feature Selection
- Low-Rank Approximation
- Structure Learning

Introduction

Low-Rank Constraint
- Reducing noise
- Removing Outliers
- Global Structure
Regularization Term
- Sparse Learning
Hypergraph Constraint
- Local Structure

Method: Low-Rank Approx.

\underset{over \; W}{minimize} \quad ||Y - XW - eb||_F^2

W: \mathbb{R}^d \times \mathbb{R}^c = AB \, , \, Rank(W) = r \\ A: \mathbb{R}^d \times \mathbb{R}^r \, , \, AA^T = I \\ B: \mathbb{R}^r \times \mathbb{R}^c

Method: Adding Regularization

\underset{over \; W}{minimize} \quad ||Y - XW - eb||_F^2 + \lambda ||W||_{2,1} \\ \\ \text{Bigger } \lambda \longrightarrow \text{More removed features}

Method: Hypergraph

\text{Coefficient Matrix: } \hat{S} = I - D_v^{-\frac{1}{2}}SD_e^{\frac{1}{2}}S^TD_v^{-\frac{1}{2}} \\ D_v: \text{ Diagonal matrix over column-sum } S \\ D_e: \text{ Diagonal matrix over row-sum } S

Laplacian: L = D - \hat{S} \\ D: \text{Diagonal matrix over row-sum } \hat{S}

Method: Hypergraph Contd.

R(W) = \frac{1}{2} \sum_{i,j}^{n} \hat{s}_{ij}||\hat{y}^i - \hat{y}^j||_2^2 \\ \qquad = \cdots \\ \qquad = tr(B^TA^TXLX^TAB)

Defining Local Structure Parameter:

Method: Objective Function

\underset{over \, A, B, b}{minimize} \quad ||Y-XAB-eb||_F^2 \\ \qquad \qquad \qquad \qquad + \alpha \, tr(B^TA^TXLX^TAB) \\ \qquad \; + \lambda \, ||AB||_{2,1}

Optimization of the objective

Denote \quad H = I - \frac{1}{n} ee^T

\underset{over \, A, B, b}{minimize} \quad ||Y-XAB-eb||_F^2 \\ \qquad \qquad \qquad \qquad + \alpha \, tr(B^TA^TXLX^TAB) \\ \qquad \qquad \quad \; \; + \lambda \, tr(B^TA^TDAB)

\text{where } D = diagonal(\frac{1}{2||w^i||_2})

Finding b: Fix A and B

L(b) \text{ derived from the objective} \\ \frac{\partial{L(b)}}{\partial{b}} = 0

b = \frac{1}{n} e^TY - e^TX^TBA

Finding B: Fix A and b

L(B) \text{ derived from the objective} \\ \frac{\partial{L(B)}}{\partial{B}} = 0

B = (A^T(XHHX^TA + \alpha XH^LHX^TA))^{−1} A^TXHHY

Finding A: Fix B and b

L(A) \text{ derived from the objective} \\ \frac{\partial{L(A)}}{\partial{A}} = 0

max\_A = (A(XHHX^T + \alpha XHLHX^T + \lambda D))^{−1} A^TXHHYY^THHX^TA

Selected Features

We've found W, which:

Shows association of two features
Shows removed (zeroed) redundant features
Can be used to construct new dataset

Sample Results with SVM

Questions

Why can't we just use splines instead of hypergraphs? (Splines + Low-Rank Approx. + Regularization)
Why can't we just use local regression instead of hypergraphs?

A Novel Low-Rank Hypergraph

Feature Selection Method

Why Dimension Reduction?

Huge datasets

More informative data using Multi-View technique

Curse of Dimensionality

Multi-View data often overfits!

Large dimensions often have small cores

Dimension Reduction

Subspace Learning

Feature Selection

Introduction

Low-Rank Constraint

Regularization Term

Hypergraph Constraint

Method: Low-Rank Approx.

Method: Adding Regularization

Method: Hypergraph

Method: Hypergraph Contd.

Defining Local Structure Parameter:

Method: Objective Function

Optimization of the objective

Finding b: Fix A and B

Finding B: Fix A and b

Finding A: Fix B and b

Selected Features

We've found W, which:

Sample Results with SVM

Questions

Why can't we just use splines instead of hypergraphs? (Splines + Low-Rank Approx. + Regularization)

Why can't we just use local regression instead of hypergraphs?

Thanks!

Here's the potato

A Novel Low-Rank Hypergraph

Feature Selection Method

Why Dimension Reduction?

Huge datasets

More informative data using Multi-View technique

Curse of Dimensionality

Multi-View data often overfits!

Large dimensions often have small cores

Dimension Reduction

Subspace Learning

Feature Selection

Introduction

Low-Rank Constraint

​Regularization Term

Hypergraph Constraint

Method: Low-Rank Approx.

Method: Adding Regularization

Method: Hypergraph

Method: Hypergraph Contd.

Defining Local Structure Parameter:

Method: Objective Function

Optimization of the objective

Finding b: Fix A and B

Finding B: Fix A and b

Finding A: Fix B and b

Selected Features

We've found W, which:

Sample Results with SVM

Questions

Why can't we just use splines instead of hypergraphs? (Splines + Low-Rank Approx. + Regularization)

Why can't we just use local regression instead of hypergraphs?

Thanks!

Here's the potato

Regularization Term