Select and submit your topic for group project 2 (on Carmen), by this Friday, 11:59pm
Plan for Today :
Principal Component Analysis
Eigenvectors, Eigenvalues
Lagrange Multipliers
Singular Value Decomposition (SVD)
Unsupervised Learning in Astronomy
Supervised learning maps input \( x \) to output \( y \) with specific target variables
Unsupervised learning reveals inherent data structure of \( x \) without explicit target variables
Two main unsupervised tasks: dimension reduction and clustering
Unsupervised methods particularly valuable in astronomy for understanding complex datasets
Introduction to Dimension Reduction
High-dimensional astronomical data (spectra, images, time series) requires compression for interpretation
Dimension reduction projects data from high-dimensional space to lower-dimensional manifolds
PCA aims to compress/decompress data while preserving essential information
Visual understanding of high-dimensional data becomes possible through effective dimension reduction
First paper !
Physical Motivation for Dimension Reduction
Astrophysical processes are governed by finite latent variables despite high-dimensional observations
Stellar chemical abundances show correlations through common nucleosynthesis processes
Quasar spectra complexity stems from few physical parameters (accretion rate, viewing angle, redshift)
Benefits include computational efficiency and improved data transmission for space missions
Data as Realizations of a Probability Distribution
Observed data points can be viewed as realizations from an unknown probability distribution \( P(X) \)
Assumption: observations are independent and identically distributed (I.I.D.) samples
Dataset consists of I.I.D. samples \( \mathcal{X} = \{\mathbf{x}_1, ..., \mathbf{x}_n\} \), where each \( \mathbf{x}_i \in \mathbb{R}^D \) is a \( D \)-dimensional vector
Dimension reduction asks: does \( P(X) \) live on a lower-dimensional manifold?
Principal Component Analysis: A Dual Perspective
PCA finds a orthogonal lower-dimensional projection \( \mathbf{z} \in \mathbb{R}^M \) of data \( \mathbf{x} \in \mathbb{R}^D \) where \( M < D \)
We assume data is centered with zero mean:
\( E[\mathbf{x}] = \bar{\mathbf{x}} \simeq \frac{1}{N}\sum_{i=1}^N \mathbf{x}_i = \mathbf{0} \), by centering the data
Variance Maximization: Find directions of maximum variance in high-dimensional space
Reconstruction Error Minimization: Minimize squared error between original data and reconstruction
Vector Projection Fundamentals
Vector projection of \( \mathbf{x} \) onto unit vector \( \mathbf{b} \) is given by
Scalar term \( \mathbf{x}^T \mathbf{b} \) measures how much of vector \( \mathbf{x} \) points in direction \( \mathbf{b} \)
To maximize variance, choose eigenvector with second largest eigenvalue
Mathematical Induction for PCA
Need rigorous proof that \( m \)-th principal component corresponds to \( m \)-th largest eigenvalue
Mathematical induction provides elegant proof without repetitive calculations
Base case the inductive step
Like domino effect: prove first case and that each case implies the next
Setting Up the Induction
Base case: First principal component \( \mathbf{b}_1 \) is eigenvector with largest eigenvalue \( \lambda_1 \) (already proven)
Inductive step: Prove \( M \)-th principal component is eigenvector with \( M \)-th largest eigenvalue \( \lambda_M \), if the other \( M-1 \) were already true
Key insight: \( M \)-th component maximizes variance in residual data after removing first \( M-1 \) components
Residual Data Analysis
Original data matrix: \( \mathbf{X} \in \mathbb{R}^{N \times D} \) (each row is a data point)
Projection onto first \( M-1 \) principal components:
First \( M-1 \) eigenvalues are zero (corresponding to \( \mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_{m-1} \))
Without loss of generality, assuming \( \lambda_M \geq \lambda_{M+1} \geq ... \geq \lambda_D \), largest eigenvalue of \( \widehat{\mathbf{S}} \) is \( \lambda_M \)
Corresponding eigenvector is \( \mathbf{b}_M \) - this maximizes variance in residual data
Completing the Induction Proof
We've shown \( M \)-th principal component is eigenvector \( \mathbf{b}_M \) with \( M \)-th largest eigenvalue \( \lambda_M \)
This completes the inductive step: if statement holds for first \( M-1 \) components, it holds for \( M \)-th component
By mathematical induction, all principal components are eigenvectors of covariance matrix in order of decreasing eigenvalues
Practical implication: Compute eigendecomposition once, select components in order of decreasing eigenvalues