Linear-time Solution of \[(\forall i)(XA_i-B_i Y=C_i)\]
Simultaneous Sylvester Equations
James B. Wilson, joint with Joshua Maglione
https://slides.com/jameswilson-3/quick-sylver/
Control Theory \(101-\varepsilon\)
Control System
- Given: system governed by diff-eq \[x'(t)=\mathsf{Model}(t,x(t), c(t))\] and observations \[y(t)=\mathsf{Obs}(t,x(t),c(t))\]
- Goal I: choose \(c(t)\) to "control" \(x(t)\) to desired states, at least as observed by \(y(t)\).
- Goal II: estimate \(x(t)\) from \(y(t)\).
Linear Time Independent (LTI) CS
\(\exists\) matrices \(A,B,C,D\) where \[x'(t)=\mathsf{Model}(t,x,c)=Ax(t)+Bc(t)\] \[y(t)=\mathsf{Obs}(t,x,c) = Cx(t)+Dc(t)\]
stabilizable if \(\exists\) matrix \(F\) \[c(t)=Fx(t)\] where \[x'(t)=Ax(t)+BFx(t)\] converges to "stable".
\(\exists\) matrices \(A,B,C,D\) where \[x'(t)=\mathsf{Model}(t,x,c)=Ax(t)+Bc(t)\] \[y(t)=\mathsf{Obs}(t,x,c) = Cx(t)+Dc(t)\]
stabilizable if \(\exists\) matrix \(F\) \(c(t)=Fx(t)\).
reduce (Schur Form) \[\begin{bmatrix} A & B\\ C & D\end{bmatrix}\sim\begin{bmatrix} I & A^{-1}B\\ 0 & G\end{bmatrix}\qquad G=D-CA^{-1}B\]
new problem \(\tilde{x}'(t)=H\tilde{x}(t)+Fy(t)+Gc(t)\) where \(H\) is Hurwitz: \(Re(\lambda)<0\) on \(\lambda\in Spec(H)\)
(LTI) Control System
Assume: \(\exists\) matrices \(A,B,C,D\) where \[x'(t)=\mathsf{Model}(t,x,c)=Ax(t)+Bc(t)\] \[y(t)=\mathsf{Obs}(t,x,c) = Cx(t)+Dc(t)\]
stabilizable if \(\exists\) matrix \(F\) \(c(t)=Fx(t)\)
Schur Form: \(G=D-CA^{-1}B\)
new problem \(\tilde{x}'(t)=H\tilde{x}(t)+Fy(t)+Gc(t)\) where \(H\) is Hurwitz: \(Re(\lambda)<0\) on \(\lambda\in Spec(H)\)
Luenberger: Estimate \(x(t)\) by solving for \(X\) \[HX-XA=-FC\] \(\|\tilde{x}(t)-Xx(t)\|\to 0\)
(LTI) Control System
LTI Control Systems
SOLVE FOR \(X\) IN \[HX-XA=-FC\]
Equation generalizes in shape hundreds of ways.
Really interested?
- Peter Benner's Control Theory Basics, in Handbook of Linear Algebra, Chapman & Hall, 2007
Solving \(XA-BY=C\)
Simplifying assumptions
- matrices are all approximately \(d\times d\)
- field operations are exact (numerical stability not considered here)
Case: \(B\) invertible
\[\begin{aligned} XA-BY &= C\\ Y & = B^{-1}(C+XA)\end{aligned}\]
So every \(X\) has a \(Y\).
Compute \(Y\) from \(X\) in \(O(d^3)\)-ops.
Case: \(B\) left-invertible
(tall-skinny \(B\) full rank)
\[\begin{aligned} XA-BY &= C\\ Y & = B^{\#}(C+XA)\end{aligned}\]
Again every \(X\) has a \(Y\).
Construct \(Y\) from given \(X\) in \(O(d^3)\)-ops.
Solving \(XA-BY=C\)
when \(X=Y\)?
Case: \(B\) invertible
\[\begin{aligned} XA-BX &= C & X & = B^{-1}(C+XA)\end{aligned}\]
...no clear solution.
Direct attack...
\[XA-BX = C\]
- System of linear equations: \[\begin{aligned}\sum_{\ell} X_{i\ell}A_{\ell k}-\sum_{m}B_{im}X_{mk}& =C_{ij}\\ (A^t\otimes I_d-I_d\otimes B)vec(X)& =vec(C)\end{aligned}\]
- \(d^2\) equations in \(d^2\) unknowns...\(O(d^6)\)-ops
d=1000 makes for exaflops of work!
100 Mega Watts @ 10 gigflops/Watt.
vs. 0.1 Watt for \(O(d^3)\) version.
Sylvester's Theorem: If \(A,B\) are square and have a common eigen value then \[(\exists ! X)(XA-BX=C).\]
By dimensions expect at most one solution, i.e. \(d^2\) equations in \(d^2\) unknowns.
Bartels-Stewart
Ways to get Spec(A)
Iteratively apply \(A\) to vector \(v\) till \(\exists\alpha_i\)\[\begin{aligned} A^{k+1} v& =\alpha_0 A^0v+\cdots +\alpha_k A^k v \\ ann_A(x) & = x^{k+1}-a_{k-1}x^{k-1}-\cdots -a_1 x^1 -a_0\end{aligned}\] Choices in how to (a) discover relation (b) apply relation.
- (Algebraic Lens) Krylov, Lanczos, Coppersmith, Weidemann,
- (Geometric Lens) QR, SVD,
- (Practical Lens) software has a button I can push.
Bartels-Stewart Theorem. There is an \(O(d^3)\)-time algorithm to solve \(XA-BX=C\).
Find Schur Decomposition
\[\begin{aligned} F^*AF & = \begin{bmatrix} I & * \\ 0 & \tilde{A}\end{bmatrix} & EBE^* & = \begin{bmatrix} I & * \\ 0 & \tilde{B}\end{bmatrix}\end{aligned}\]
Solve smaller system \[\tilde{X}\tilde{A}-\tilde{B}\tilde{X}=\tilde{C}\] and pull back through
\[EXAF+EBYF=ECF\]
Why did \(X=Y\) make solving so hard?
Why so few solutions?
\(X=Y\) shifts from one Sylvester system to a simultaneous system...
\[\begin{aligned} XA_1 -B_1 Y & = C_1\\ XA_2-B_2 Y & = C_2\end{aligned}\] where \(A_2=I_d, B_2=I_d,C_2=0\).
Same happens if \(X=X^t, \bar{X}^t\)
Real problem is to solve:
\[(\forall i)(XA_i - B_i Y=C_i)\]
\[(\forall i)(XA_i - B_i Y=C_i)\] in algebra
- Roth Removal Rule/Block-diagonalizing/Module direct sums \[\begin{bmatrix} I & -X \\ 0 & I \end{bmatrix}\begin{bmatrix} B_i & C_i\\ 0 & A_i \end{bmatrix}\begin{bmatrix} I & X\\ 0 & I\end{bmatrix}=\begin{bmatrix} B_i & 0\\ 0 & A_i\end{bmatrix}\]
- Endomorphisms of modules \[End(_R M)=\{ X\mid (\forall i)(XR_i=R_iX)\}\]
- Adjoint algebras \[\{(X,Y)\mid (\forall i)(XA_i=B_iY)\}\]
- Schneider conjecture: \(O(d^6)\) is the complexity
\[\sum_{\ell} X_{i\ell} A_{\ell jk} -\sum_{m}B_{im}Y_{mjk} =C_{ijk}\]
- \(d^2 e\) equations in \(2d^2\) variables, highly over determined (expect few to now solutions)
- Not very sparse: 1/d nonzeros;
- Random \(d^2\times d^2\) subsystem is basically random dense, so \(O(d^6)\)-ops to solve
\(A^s=[A_{**s}]\) \(B_t=[B_{*t*}]\)
(Technically this is the matrix for a transposed version.)
Tempted to echelonize?
It Shave an order of magnitude, but fills in quick.
And yet a ton of symmetry...exploit this!
Solving
\[(\forall i)(XA_i - B_i Y=C_i)\]
In \(O(d^3)\) ops away from edge cases
It is a tensor problem
- \([A]=[A_1,\ldots,A_c]\in K^s\otimes K^b\otimes K^c\)
- \([B]=[B_1,\ldots,B_c]\in K^a\otimes K^t\otimes K^c\)
- \([C]=[C_1,\ldots,C_c]\in K^a\otimes K^b\otimes K^c\)
- Now under tensor contraction (generalized matrix multiplication) solve one tensor equation \[X [A]-[B]Y=[C]\]
Solution Scheme Repeat Bartels-Stewart
- Reduce dimension via operators \(E,F\) \[\begin{aligned} EX [A]F+E[B]YF & =E[C]F \\ \tilde{X}[\tilde{A}]+[\tilde{B}]\tilde{Y} & = \tilde{C}\end{aligned}\]
- Reconstruct solution from lower-dimensional solution
What is different?
- \(E,F\) are 4-tensors, denote \([[E]]\) (was 2-tensors) \[\begin{aligned} [[E]]X [A][[F]]+[[E]][B]Y[[F]] & =[[E]][C][[F]] \\ \tilde{X}[\tilde{A}]+[\tilde{B}]\tilde{Y} & = \tilde{C}\end{aligned}\]
- These are tensor contractions and it is quite incorrect to write as binary products, e.g. [A][[F]], we need new notation.
2-tensor/Matrix
3-tensor/list of matrices
4-tensor/Filing cabinet of spreadsheets
- for the dot-product \[\sum_{\ell} X_{i\ell} A_{\ell jk}\]
- more generally can replace dot-product with any bilinear form \(\langle|\rangle:U\times U\rightarrowtail K\)
\([[E]]X[A][[F]]\)
\(-[[E]][B]Y[[F]]\)
\(=[[E]][C][[F]]\)
\[\begin{bmatrix} (A^1)^{-1} & \\ -(A^1)^{-1}A^2& I & & \\ \vdots & & \ddots & \\ -(A^1)^{-1} A^c & & & I\end{bmatrix}\begin{bmatrix} A^1\\ A^2 \\ \vdots \\ A^c\end{bmatrix}=\begin{bmatrix}A^1\\ 0\\ \vdots\\ 0\end{bmatrix}\]
Schur Decomposition in Matrix blowup
Schur Decomposition Tensorially
In tensor form...use face to clear the drawer (face elimination)
\[\begin{bmatrix} A_1^{-1} & \\ -A_1^{-1}A_2& I & & \\ \vdots & & \ddots & \\ -A_1^{-1} A_c & & & I\end{bmatrix}\begin{bmatrix} A_1\\ A_2 \\ \vdots \\ A_c\end{bmatrix}=\begin{bmatrix}A_1\\ 0\\ \vdots\\ 0\end{bmatrix}\]
In tensor form...use face to clear the drawer (face elimination)
Controlled 4-tensor/Filing cabinet one drawer full
In tensor form...use face to clear the drawer (face elimination)
=
\([[E]]X[A][[F]]\)
\(-[[E]][B]Y[[F]]\)
\(=[[E]][C][[F]]\)
(As [[E]] slides past [[F]] we can face echelonize [B])
(Face echelonize [A] leaves a tiny system to solve)
Loose Ends...
- What if \(A_1\), \(B_1\) not invertible? Not Square
- Combine more than one slice.
- Tao's "Slice rank" is the metric
- Numerical Stability
- Implement and refine
Schneider's Problem: The actual complexity now is \(O(d^{3.5})\) to find one solution.
The takeaway
a tensor problem needs a tensor solution
QuickSylver
By James Wilson
QuickSylver
Linear time solution to simultaneous Sylvester equation solvers.
- 475