\[m\]
\[n\]
\[A\]
\[U\]
\[\Sigma\]
\[V^T\]
\[=\]
\[m\]
\[n\]
\[A\]
\[U\]
\[\Sigma\]
\[V^T\]
\[=\]
left singular vectors
right singular vectors
singular values
\[A \approx A_k = \sigma_1 u_1 v^T_1 + \sigma_2 u_2 v^T_2 + \cdots + \sigma_k u_k v^T_k\]
Randomized SVD
Matrix \(A\)
Rank \(k\)
Input
\(\sigma_1 u_1 v^T_1 + \cdots + \sigma_k u_k v^T_k\)
Output
\( \textbf{Input: } A \in \mathbb{R}^{m \times n}\), target rank \(k\)
- Randomly generate \(n \times k\) matrix \(\Omega \sim \mathcal{N}(0,1)\)
- \(Y \leftarrow A\Omega\)
- Perform QR decomposition on \(Y\), \(\ \ Y =: QR\)
- \(B \leftarrow Q^\top A\)
- Perform SVD on \(B\), \(\ \ B =: \tilde{U} \Sigma V^\top\)
- \(U \leftarrow Q\tilde{U}\)
\(\textbf{Return: } A \approx \sigma_1 u_1 v_1^\top + \cdots + \sigma_k u_k v_k^\top\)
\( \textbf{Input: } A \in \mathbb{R}^{m \times n}\), target rank \(k\), oversampling parameter \(p\)
- Let \(l = k + p\)
- Randomly generate \(n \times l\) matrix \(\Omega \sim \mathcal{N}(0,1)\)
- \(Y \leftarrow A\Omega\)
- Perform QR decomposition on \(Y\), \(\ \ Y =: QR\)
- \(B \leftarrow Q^\top A\)
- Perform SVD on \(B\), \(\ \ B =: \tilde{U} \Sigma V^\top\)
- \(U \leftarrow Q\tilde{U}\)
\(\textbf{Return: } A \approx \sigma_1 u_1 v_1^\top + \cdots + \sigma_k u_k v_k^\top\)
\( \textbf{Input: } A \in \mathbb{R}^{m \times n}\), target rank \(k\), oversampling parameter \(p\)
- Let \(l = k + p\)
- Randomly generate \(n \times l\) matrix \(\Omega \sim \mathcal{N}(0,1)\)
- \(Y \leftarrow A\) \((A^\top A)^2 \) \(\Omega\) (subspace iteration)
- Perform QR decomposition on \(Y\), \(\ \ Y =: QR\)
- \(B \leftarrow Q^\top A\)
- Perform SVD on \(B\), \(\ \ B =: \tilde{U} \Sigma V^\top\)
- \(U \leftarrow Q\tilde{U}\)
\(\textbf{Return: } A \approx \sigma_1 u_1 v_1^\top + \cdots + \sigma_k u_k v_k^\top\)
\( \textbf{Input: } A \in \mathbb{R}^{m \times n}\), target rank \(k\), oversampling parameter \(p\), \(G\) GPUs
- Let \(l = k + p\). Partition \(A\) by rows: \(A = [A_0;\ A_1;\ \cdots;\ A_{G-1}]\), \(A_i\) is on GPU \(i\)
- Randomly generate \(n \times l\) matrix \(\Omega \sim \mathcal{N}(0,1)\)
- \(\textbf{On each GPU } i\), do \(Y_i \leftarrow A_i \Omega\) and \(Y_i =: \bar{Q}_i R_i\)
- \(\textbf{TSQR Reduce:}\)
- Stack \([R_0;\ R_1;\ \cdots;\ R_{G-1}] =: T \cdot R\)
- Split \(T = [T_0;\ T_1;\ \cdots;\ T_{G-1}]\)
- \(\textbf{On each GPU } i\), do \(Q_i \leftarrow \bar{Q}_i T_i\) and \(B_i \leftarrow Q_i^\top A_i\)
- \(\textbf{Reduce: }\) \(B \leftarrow \sum_i B_i\)
- \(\textbf{On GPU 0}\), \(\ B =: \tilde{U} \Sigma V^\top\)
- \(\textbf{On each GPU } i\), \(\ U_i \leftarrow Q_i \tilde{U}\)
\(\textbf{Return: } A \approx \sigma_1 u_1 v_1^\top + \cdots + \sigma_k u_k v_k^\top\)
\( \textbf{Input: } A \in \mathbb{R}^{m \times n}\), target rank \(k\), oversampling parameter \(p\), \(G\) GPUs
- Let \(l = k + p\). Partition \(A\) by rows: \(A = [A_0;\ A_1;\ \cdots;\ A_{G-1}]\), \(A_i\) is on GPU \(i\)
- Randomly generate \(n \times l\) matrix \(\Omega \sim \mathcal{N}(0,1)\)
- \(\textbf{On each GPU } i\), do \(Y_i \leftarrow A_i \Omega\) and \(Y_i =: \bar{Q}_i R_i\)
- \(\textbf{TSQR Reduce:}\)
- Stack \([R_0;\ R_1;\ \cdots;\ R_{G-1}] =: T \cdot R\)
- Split \(T = [T_0;\ T_1;\ \cdots;\ T_{G-1}]\)
- \(\textbf{On each GPU } i\), do \(Q_i \leftarrow \bar{Q}_i T_i\) and \(B_i \leftarrow Q_i^\top A_i\)
- \(\textbf{Reduce: }\) \(B \leftarrow \sum_i B_i\)
- \(\textbf{On GPU 0}\), \(\ B =: \tilde{U} \Sigma V^\top\)
- \(\textbf{On each GPU } i\), \(\ U_i \leftarrow Q_i \tilde{U}\)
\(\textbf{Return: } A \approx \sigma_1 u_1 v_1^\top + \cdots + \sigma_k u_k v_k^\top\)
cost of communication: \(O(l ^2)\)
cost of communication: \(O(nl)\)
cost of communication: \(O(lk)\)
\( \textbf{Input: } A \in \mathbb{R}^{m \times n}\), target rank \(k\), oversampling parameter \(p\), \(G\) GPUs
- Let \(l = k + p\). Partition \(A\) by rows: \(A = [A_0;\ A_1;\ \cdots;\ A_{G-1}]\), \(A_i\) is on GPU \(i\)
- Randomly generate \(n \times l\) matrix \(\Omega \sim \mathcal{N}(0,1)\)
- \(\textbf{On each GPU } i\), do \(Y_i \leftarrow A_i \Omega\) and \(Y_i =: \bar{Q}_i R_i\)
- \(\textbf{TSQR Reduce:}\)
- Stack \([R_0;\ R_1;\ \cdots;\ R_{G-1}] =: T \cdot R\)
- Split \(T = [T_0;\ T_1;\ \cdots;\ T_{G-1}]\)
- \(\textbf{On each GPU } i\), do \(Q_i \leftarrow \bar{Q}_i T_i\) and \(B_i \leftarrow Q_i^\top A_i\)
- \(\textbf{Reduce: }\) \(B \leftarrow \sum_i B_i\)
- \(\textbf{On GPU 0}\), \(\ B =: \tilde{U} \Sigma V^\top\)
- \(\textbf{On each GPU } i\), \(\ U_i \leftarrow Q_i \tilde{U}\)
\(\textbf{Return: } A \approx \sigma_1 u_1 v_1^\top + \cdots + \sigma_k u_k v_k^\top\)
cost of communication: \(O(l ^2)\)
cost of communication: \(O(nl)\)
(most crucial, \(\because n \gg l \approx k\))
cost of communication: \(O(lk)\)
\( \textbf{Input: } A \in \mathbb{R}^{m \times n}\), target rank \(k\), oversampling parameter \(p\), \(G\) GPUs
- Let \(l = k + p\). Partition \(A\) by rows: \(A = [A_0;\ A_1;\ \cdots;\ A_{G-1}]\), \(A_i\) is on GPU \(i\)
- Randomly generate \(n \times l\) matrix \(\Omega \sim \mathcal{N}(0,1)\)
- \(\textbf{On each GPU } i\), do \(Y_i \leftarrow A_i \Omega\) and \(Y_i =: \bar{Q}_i R_i\)
- \(\textbf{TSQR Reduce:}\)
- Stack \([R_0;\ R_1;\ \cdots;\ R_{G-1}] =: T \cdot R\)
- Split \(T = [T_0;\ T_1;\ \cdots;\ T_{G-1}]\)
- \(\textbf{On each GPU } i\), do \(Q_i \leftarrow \bar{Q}_i T_i\) and \(B_i \leftarrow Q_i^\top A_i\)
- \(\textbf{Reduce: }\) \(B \leftarrow \sum_i B_i\)
- \(\textbf{On GPU 0}\), \(\ B =: \tilde{U} \Sigma V^\top\)
- \(\textbf{On each GPU } i\), \(\ U_i \leftarrow Q_i \tilde{U}\)
\(\textbf{Return: } A \approx \sigma_1 u_1 v_1^\top + \cdots + \sigma_k u_k v_k^\top\)
cost of communication: \(O(nl)\)
\( \textbf{Input: } A \in \mathbb{R}^{m \times n}\), target rank \(k\), oversampling parameter \(p\), \(G\) GPUs
- Let \(l = k + p\). Partition \(A\) by rows: \(A = [A_0;\ A_1;\ \cdots;\ A_{G-1}]\), \(A_i\) is on GPU \(i\)
- Randomly generate \(n \times l\) matrix \(\Omega \sim \mathcal{N}(0,1)\)
- \(\textbf{On each GPU } i\), do \(Y_i \leftarrow A_i \Omega\) and \(Y_i =: \bar{Q}_i R_i\)
- \(\textbf{TSQR Reduce:}\)
- Stack \([R_0;\ R_1;\ \cdots;\ R_{G-1}] =: T \cdot R\)
- Split \(T = [T_0;\ T_1;\ \cdots;\ T_{G-1}]\)
- \(\textbf{On each GPU } i\), do \(Q_i \leftarrow \bar{Q}_i T_i\) and \(B_i \leftarrow Q_i^\top A_i\)
- \(\textbf{Reduce: }\) \(B \leftarrow \sum_i B_i\)
- \(\textbf{On GPU 0}\), \(\ B =: \tilde{U} \Sigma V^\top\)
- \(\textbf{On each GPU } i\), \(\ U_i \leftarrow Q_i \tilde{U}\)
\(\textbf{Return: } A \approx \sigma_1 u_1 v_1^\top + \cdots + \sigma_k u_k v_k^\top\)
cost of communication: \(O(lk)\)
\( \textbf{Input: } A \in \mathbb{R}^{m \times n}\), target rank \(k\), oversampling parameter \(p\), \(G\) GPUs
- Let \(l = k + p\). Partition \(A\) by rows: \(A = [A_0;\ A_1;\ \cdots;\ A_{G-1}]\), \(A_i\) is on GPU \(i\)
- Randomly generate \(n \times l\) matrix \(\Omega \sim \mathcal{N}(0,1)\)
- \(\textbf{On each GPU } i\), do \(Y_i \leftarrow A_i \Omega\) and \(Y_i =: \bar{Q}_i R_i\)
- \(\textbf{TSQR Reduce:}\)
- Stack \([R_0;\ R_1;\ \cdots;\ R_{G-1}] =: T \cdot R\)
- Split \(T = [T_0;\ T_1;\ \cdots;\ T_{G-1}]\)
- \(\textbf{On each GPU } i\), do \(Q_i \leftarrow \bar{Q}_i T_i\) and \(B_i \leftarrow Q_i^\top A_i\)
- \(\textbf{Reduce: }\) \(B \leftarrow \sum_i B_i\)
- \(\textbf{On GPU 0}\), \(\ B =: \tilde{U} \Sigma V^\top\)
- \(\textbf{On each GPU } i\), \(\ U_i \leftarrow Q_i \tilde{U}\)
\(\textbf{Return: } A \approx \sigma_1 u_1 v_1^\top + \cdots + \sigma_k u_k v_k^\top\)
deck
By Gino
deck
- 0