Denser Tensor Spaces

2021 James B. Wilson, Colorado State University

Major credit is owed to...

Uriya First

U. Haifa

Joshua Maglione,

Bielefeld

Peter Brooksbank

Bucknell

  • The National Science Foundation Grant DMS-1620454
  • The Simons Foundation support for Magma CAS
  • National Secturity Agency Grants Mathematical Sciences Program
  • U. Colorado Dept. Computer Science
  • Colorado State U. Dept. Mathematics

Three Goals of This Talk

  • Cluster non-zeros in a tensor.
  • Compare two tensors up to basis change
  • Make the algorithms for the above feasible.

Can we agree on "tensors"?

(I hope!)

Are all of these tensors?

Don't be a joke

If you don't use linear combinations on some axis of your data...

then its not actually a tensor, sorry.

"What is a vector?"

"An element of a vector space."

\(U_0\oslash U_1=\{f:U_1\to U_0\mid f(u+\lambda v)=f(u)+\lambda f(v)\}\)

\(U_0,U_1,\ldots\) are vector spaces (or modules).

Linear maps:

\(U_0\oslash U_1\oslash U_2 := \{f:U_2\to U_0\oslash U_1 \mid f(u+\lambda v)=f(u)+\lambda f(v)\}\)

Bi-Linear maps:

\(U_0\oslash \cdots U_{k-1}\oslash U_k:=(U_0\oslash\cdots\oslash U_{k-1})\oslash U_k\)

\(k\)-multi-linear maps:

"What is a tensor?"

"An element of a tensor space."

\(U_0,U_1,\ldots\) are vector spaces (or modules).

\(U_0\oslash \cdots U_{k-1}\oslash U_k:=(U_0\oslash\cdots\oslash U_{k-1})\oslash U_k\)

\(k\)-multi-linear maps:

Defn. A Tensor Space \(T\) is a vector space an a linear map \[\langle \cdot|:T\to U_0\oslash\cdots\oslash U_k\]

Tensors are elements of tensor spaces.

\(T=\mathbb{M}_{2\times 3}(\mathbb{R})\) is a tensor space in at least 3 ways!

\[\langle \cdot |:T\mapsto \mathbb{R}^2\oslash \mathbb{R}^3\]

\[\langle M|u\rangle := Mu\]

\[|\cdot \rangle:T\mapsto \mathbb{R}^3\oslash \mathbb{R}^2\]

\[\langle v| M\rangle := v^{\dagger}M\]

\[|\cdot|:T\mapsto \mathbb{R}\oslash\mathbb{R}^2\oslash \mathbb{R}^3\]

\[\langle v| M|u\rangle := v^{\dagger}Mu\]

Matrix as linear map on right.

Matrix as linear map on left.

Matrix as bilinear form.

This abstraction does wonders for creating a fluid tensor software package.

Operating on Tensors

Tier I

low-brow: reindex,

high-brow: affine transforms of polytopes 

Evaluation

Contractions

Layout data so nothing moves!

Logically equivalent circuits

Fight eager evaluation

Tier II

=

(Data) Acting on Tensors as arrays

(Lin. Alg.) Acting on tensors as functions

(Physics/Algebra) Acting on Tensors as Operads/Networks

Tier II key: Iterate

Generalizes Characteristic polynomial to ideals

\[I(t,\omega)=(x^2-x, y^2-y, xy)\qquad I(t,\tau)=(x^2, y^3,xy-y^2)\]

Multi-spectrum Rule = \(\langle t| p(\omega) =0\)

Thm FMW-Connection. \(S\) set of tensors, \(P\subset \mathbb{R}[X]\), \(\Omega\subset \prod_a \mathbb{M}_{d_a}(\mathbb{R})\)

\[T(P,\Omega)=\{t\mid P\textnormal{ in multi-spec } t\text{ at } \Omega\}\]

\[I(S,\Omega)=\{p\mid p\textnormal{ in multi-spec } S\text{ at } \Omega\}\]

\[Z(S,P)=\{\omega \mid P\textnormal{ in multi-spec } S\text{ at } \omega\}\]

Then \[S\subset T(P,\Omega)\Leftrightarrow P\subset I(S,\Omega) \Leftrightarrow \Omega\subset Z(S,P)\]

Thm FMW-Construction.

These are each polynomial time computable.

Multi-spectrum Rule = \(\langle t| p(\omega) =0\)

Tier III

Functors on tensors, e.g. \((U_0\oslash\cdots \oslash U_K)\to (U_i\otimes\cdots\otimes U_k\to U_0\oslash\cdots\oslash U_{i-1})\)

Save yourself time if you program these functors and avoid boiler plate later

How to shrink a tensor space

Red is the space we search/work within.

Add some algebra \(A\) in the form of \(U\otimes_A V\)

The bigger the algebra the better.

 

Rule of thumb

\[\dim (U\otimes_A V)\approx \frac{\dim U\dim V}{\dim A}\]

 

Some effort now in working with the algebra & modules, yet you can at least prove and plan for that.

Adjoint-Tensor Theorem

  • Given: 2-tensors \(S\subset \mathbb{M}_{a\times b}(\mathbb{R})\)
  • Want: Algebra to shrink space around \(S\)

Theorem Brooksbank-W. (2012) \[\mathrm{Adj}(S)=\{(F,G)\in \mathbb{M}_a(\mathbb{R})\times \mathbb{M}_b(\mathbb{R})\mid (\forall T\in S)(FT=TG^t)\}\] is an optimal choice and unique up to isomorphism.

 

 

 

 

 

 

How not to shrink a tensor space

\(\mathbb{R}^4\otimes_{\mathbb{R}}\mathbb{R}^{12}\otimes_{\mathbb{R}}\mathbb{R}^{6}\)

\(\mathbb{R}^4\otimes_{\mathbb{M}_2(\mathbb{R})}\mathbb{R}^{12}\otimes_{\mathbb{M}_3(\mathbb{R})}\mathbb{R}^{6}\)

\(\cong\mathbb{R}^2\otimes_{\mathbb{R}}\mathbb{R}^{2}\otimes_{\mathbb{R}}\mathbb{R}^{2}\)

\(\mathbb{R}^4\otimes_{\mathbb{M}_4(\mathbb{R})}\mathbb{R}^{12}\otimes_{\mathbb{M}_3(\mathbb{R})}\mathbb{R}^{6}\)

\(\cong\mathbb{R}\otimes_{\mathbb{R}}\mathbb{R}\otimes_{\mathbb{R}}\mathbb{R}^{2}\)

\(\mathbb{R}^4\otimes_{\mathbb{M}_2(\mathbb{R})}\mathbb{R}^{12}\otimes_{\mathbb{M}_6(\mathbb{R})}\mathbb{R}^{6}\)

\(\cong\mathbb{R}^2\otimes_{\mathbb{R}}\mathbb{R}\otimes_{\mathbb{R}}\mathbb{R}\)

Adjoint-tensor methods in valence \(>2\)

  • \(S\subset U_1\otimes\cdots\otimes U_v\) get \(\binom{v}{2}\) generalized adjoints i.e. \[\mathrm{Adj}(S)_{ij}\subset \mathbb{M}_{d_i}(\mathbb{R})\times \mathbb{M}_{d_j}(\mathbb{R})\]
  • But the product only has \(v-1\) spots to hang them...\[U_1\otimes_{A_{12}}U_2\otimes_{A_{23}}U_3\otimes\cdots\otimes_{A_{(v-1)v}} U_v.\]
  • We can permute...but rather arbitrary.

Things I wonder about....

Why can't we just act on one side?

 

​E.g. \(U\otimes_A V\) needs \(U_A, {_A V}\).  Worse, \(U\otimes_A V\otimes_B W\) needs \({_A V_B}\) a "bi-module".

 

Why do we tolerate "natural" isomorphisms \[U\otimes (V\otimes W)\cong (U\otimes V)\otimes W\]

If its natural, can't we just write these down as equal?!

A new tensor product

Whitney Tensor Product

A Different Tensor Product

New Tensor product:

\(\Omega\subset \mathbb{M}_{d_1}(\mathbb{R})\times\cdots \times \mathbb{M}_{d_k}(\mathbb{R})\); \(P\subset \mathbb{R}[x_1,\ldots,x_k]\)

 

\[\Xi(P,\Omega)=\left\langle \sum_e \lambda_e \omega_1^{e_1}\otimes\cdots\otimes \omega_k^{e_k} ~\middle|~\sum_e\lambda_e X^e\in P, \omega\in \Omega\right\rangle\]

\[(]U_1,\ldots,U_k[)_{\Omega}^P := (U_1\otimes \cdots \otimes U_k)/\Xi(P,\omega)\]

Then we have:

\[(]\cdots[):U_1\times\cdots\times U_k\hookrightarrow (]U_1,\ldots,U_k[)_{\Omega}^P\]

defined by 

\[(]u_1,\ldots,u_k[) := u_1\otimes\cdots\otimes u_k+\Xi(P,\Omega)\]

Condensing Whitney Tensor Products

Condensing our alternative

One corner to contract makes each axis independent.

(No bimodules, no "associative" rules)

\[(]\cdots[):U_1\times\cdots\times U_k\hookrightarrow (]U_1,\ldots,U_k[)_{\Omega}^P\]

is the universal tensor such that every \(\omega\in \Omega\) has \(P\) in its multi-spectrum.

Intuition....force the spectrum

Consequence:

  • If \(P\) is homogeneous linear (so zero's are some affine subspace) 
  • Then it is contained in a hyperplane.
  • Generically all hyperplanes are equal up to the torus action!
  • Maybe there is a universally smallest product....?

Derivation-Densor Theorem

  • Given: tensors \(t\in \mathbb{R}^{d_1}\otimes \cdots \otimes \mathbb{R}^{d_v}\)
  • Want: Algebra to shrink space around \(t\)

Theorem First-Maglione-W. \(\mathrm{Der}(t)\) is all \((\delta_i)_i\in \prod\mathbb{M}_{d_i}(\mathbb{R})\) satisfying \[0=\langle t|\delta_1u_1,\ldots,u_n\rangle+\cdots+\langle t|u_1,\ldots,\delta_v u_v\rangle\] is an optimal choice and unique up to isomorphism.

 

 

 

 

 

 

Lie algebras are required

Theorem First-Maglione-W.  If \(P=(\Lambda X)\), \(\Lambda\in \mathbb{M}_{r\times k}\) is full rank and if \[Z(t,P)=\{\omega\mid P\textnormal{ in the multi-spec } t\}\]

is an algebra, then it is a Lie algebra in at least \[k-2r\] coordinates.

\(U\otimes_A V\) for \(A\) associative is a fluke, it is the r=1 case when k=2.

Lie algebras are a good thing

  • No bimodule condition as Lie is skew-commutative.
  • Unlike square matrix rings, a fixed simple Lie algebra can act faithfully and irreducibly on unbounded dimensions.
  • Hence compression like this exists even with just 3-dimensional derivations!

Orthogonalizing data

Problem posed in:

Acar, Camtepe, and Yener, Collective Sampling and Analysis of High Order Tensors for Chatroom Communications, Proc. 4th IEEE Int. Conf.Intel. and Sec. Info., 2006, pp. 213–224

Orthogonalizing a tensor is an algebra problem.

Reality

The algebra is never there,

never that nice,

not even associative.

No algebra?  Make one by enrichment!

Its decompositions do the job.

\(t\in \mathbb{R}^{10}\otimes \mathbb{R}^7\)

\(\mathrm{Adj}(t)\cong \mathbb{R}\oplus \mathbb{M}_2(\mathbb{R})\) \[\begin{aligned} t & \in \mathbb{R}^{10}\otimes_{\mathrm{Adj}(t)}\mathbb{R}^7 \\ & \cong (\mathbb{R}^{10}\otimes_{\mathbb{R}\oplus 0}\mathbb{R}^7)\oplus (\mathbb{R}^{10}\otimes_{0\oplus \mathbb{M}_2(\mathbb{R})}\mathbb{R}^7) \\ & = (\mathbb{R}^2\otimes \mathbb{R}^3)\oplus (\mathbb{R}^4\otimes\mathbb{R}^2)\end{aligned}\]

\[U\otimes_{A_1\oplus A_2}V\otimes W\cong (U_1\otimes_{A_1} V_1\otimes W)\oplus(U_2\oplus_{A_2}V_2\otimes W)\]

Orthogonalizes in higher valence

Did we get all decomposition types?

Thm. (FMW-Singular)

  1. Singularities types are in bijection with simplicial complexes \(\Delta\).
  2. The multi-spectrum of operators supported on singularities contain the Stanely-Reisner ideal \((X^e\mid \mathrm{supp}(e)\notin \Delta)\)

Valence 2

Valence 3

Theory & Practice

Parker-Norton 1975 MeatAxe: polynomial time algorithm for \[XTX^{-1}=T_1\oplus \cdots \oplus T_{\ell}.\]

Performance: Dense 1/2 million dimensions in an hour, on desktop.

 

W. 2008: Proved uniqueness and polytime-algorithms for \[\begin{aligned} XTX^{\dagger} & = T_1\perp\cdots\perp T_{\ell}\\ XTY & =T_1\oplus \cdots\oplus T_{\ell}\end{aligned}\] 

Generalizations being explored now.

Pros.

  • Exact solution, no missing outliers, no need to train AI.
  • Comes with uniqueness theorems (Jordan-Holder, Krull-Schmidt)
  • Polynomial-time, in fact nearly linear time.

Cons.

  1. The algebra is tough (non-associative, hard modules) (...solution...hire algebraist...)
  2. Implementations are in Computer Algebra Systems (with increased funding this will change)
  3. Noise model is unexplored (Statisticians I've asked are more optimistic than me...hmm.)

Structure in Networks

 

Data credit to Frank W. Marrs III

Los Alamos National Labs

Further Fact of spectra: 

If derivation and nilpotent then...

How to apply? 

  1. Compute Der,
  2. Use Lie theory algorithms to locate such \(\delta\)
  3. change coordinates to make data structured.

Actor Pair Exchange  Conditions

Partners

Action/Reaction

Benafactor

Between pairs, 6 total (3 pictured)

\(\vdots\)

Actor Pair Exchange on 7 actors

Tensor:

  • Rows/Columns are pairs (a,b)
  • each slice of tensor is a exchange pattern.
  • Very combinatorial but 10,584 params.
  • Calls for spectural "graph theory" but on hypergraphs

Actor Pair Exchange on 7 actors

Tensor:

  • Algebras identify 2 outliers
  • Cluster data into 4 layers (ideals) \(\Rightarrow\) breakup into 4 iterations
  • Reduced to 250 parameters.

Entanglement Classes

 

Verstraete, Dehaene, De Moor, Verschelde, Four qubits can be entangled in nine different ways, Phys. Rev. A 65 (2002)

D. and B. Williamson,Mari¨en, Matrix product operators for symmetry-protected topological phases: Gauging and edge theories, Phys. Rev. B 94 (2016)

Quantum Particles modeled as vectors in \(\mathbb{C}^d\)

Entangled Particles as in \(\mathbb{C}^{d_1}\otimes\cdots \otimes \mathbb{C}^{d_k}\)

Visualize as n-gon.

Objective: What is the large-scale physics of a many body quantum material?

Comes down to symmetries of then tensors.

Valence 4?

What qualifies as a symmetry of a tensor?  Not just anything...surprisingly combinatorial...

Valance 3

Yes

No.

Thm FMW-Groupoid.

\[Z(t,p)^{\times}=\{\omega\mid p \textnormal{ in multi-spec } t\}\] is a group in some tensor category if, and only if, \[p=X^g(X^e-X^f)\] where \(e,f\) have disjoint support and are \(\{0,1\}\) valued.

Solution: chase the algebraic geometry of the spectra.... it turns out to be toric and thus combinatorial!

QuickSylver

Solving \((\forall i)(XA_i+B_iY=C_i)\) in nearly linear time

Derivations require Solving

\((\forall i)(XA_i+B_iY=C_i)\) and variations.

Naive:

Solving \((\forall i)(XA_i+B_iY=C_i)\) is linear in \(d^2\) variables so \(O(d^{2\omega})\subset O(d^6)\) work.

 

Good enough in theory, but hard to fit in memory and unrealistic at scale.

 

Bartels-Stewart Type Solution for\[XA+BY=C\]

  1. Choose \(E\) and \(F\) low rank matrices with pseudo-inverses \(E^*,F^*\).
  2. Solve \[E(XA)F+E(BY)F=ECF\] which has lower dimension.
  3. Pullback solution using \(E^*,F^*\).

Yields \(O(d^{\omega})\) time algorithms, \(\omega\leq 3\)

Tensor Bartels-Stewart Solving\[(\forall i)(XA_i+B_iY=C_i)?\]

  1. Choose \([E]\) and \([F]\) low rank tensors with pseudo-inverses \([E]^*,[F]^*\).
  2. Solve \[[E](X[A])[F]+[E]([B]Y)[F]=[E]C[F]\] which has lower dimension.
  3. OVERLAPS DESTROY EACH OTHER'S WORK

\[\delta_A^{12}(u\otimes v\otimes w)=u\otimes v\otimes w-u_1\sum_{\ell=2} e_{\ell}\otimes e_{\ell}A v\otimes w\]

Prop. \(\delta_A^{12}\circ \delta_B^{13}=\delta_B^{13}\circ \delta_A^{12}\)

\(E=\delta_B^{13}\) and \(F=\delta_A^{12}\)

Face Elimination: a tensor solution

Tensor Bartels-Stewart Solving\[(\forall i)(XA_i+B_iY=C_i)?\]

  1. Choose \([E]\) and \([F]\) low rank tensors with pseudo-inverses \([E]^*,[F]^*\).
  2. Solve \[[E](X[A])[F]+[E]([B]Y)[F]=[E]C[F]\] which has lower dimension.
  3. OVERLAPS SLIDE PAST EACH OTHER

Thm Collery-Maglione-W.

QuickSylver solves simultaneous generalized Sylvester equations in time \(O(d^{3})\) (for 3-tensors).

Thank You!

 

Want details?

Several related videos/software/resources at

https://thetensor.space/

 

A recently updated version of some of the main results at

https://www.math.colostate.edu/~jwilson/papers/Densor-Final-arxiv.pdf

Denser Tensor Spaces

By James Wilson

Denser Tensor Spaces

Definitions and properties of tensors, tensor spaces, and their operators.

  • 444