Optimization Informed by Geometric Invariant Theory and Symplectic Geometry

Clayton Shonkwiler

Colorado State University

shonkwiler.org

/icerm24

this talk!

Recent Progress on Optimal Point Distributions and Related Fields

June 3, 2024

Collaborators

Tom Needham

Florida State University

Funding

National Science Foundation (DMS–2107700)

Simons Foundation (#709150)

Dustin Mixon

The Ohio State University

Soledad Villar

Johns Hopkins University

Anthony Caine

Colorado State University

Take-Home Message

Symmetry + geometry sometimes tells you an optimization problem is easier than expected.

Equal-Norm Parseval Frames

A spanning set \(f_1, \dots , f_n \in \mathbb{C}^d\) is a frame.

\(\Rightarrow F = [f_1 \cdots f_n] \in \mathbb{C}^{d \times n}\)

Definition.

\(\{f_1,\dots, f_n\}\subset \mathbb{C}^d\) is a Parseval frame if \(\operatorname{Id}_{d\times d}=FF^*=f_1f_1^*+\dots+f_nf_n^*\).

An equal-norm Parseval frame (ENP frame) is a Parseval frame \(f_1,\dots , f_n\) with \(\|f_i\|^2=\|f_j\|^2\) for all \(i\) and \(j\).

\(\sum \|f_i\|^2=\operatorname{tr}F^*F=\operatorname{tr}FF^*=\operatorname{tr}\operatorname{Id}_{d \times d} = d\), so each \(\|f_i\|^2=\frac{d}{n}\).

Frame Potential

Definition [Benedetto–Fickus, Casazza–Fickus]

The frame potential is

\(\operatorname{FP}(F) = \|FF^\ast\|_{\operatorname{Fr}}^2\)

Proposition [cf. Welch]

The equal-norm Parseval frames are exactly the global minima of \(\operatorname{FP}|_{\text{equal norm}}\).

Theorem [Benedetto–Fickus]

As a function on equal-norm frames with fixed \(d\) and \(n\), \(\operatorname{FP}\) has no spurious local minima.

Frame Potential

Optimization

Theorem [with Mixon, Needham, and Villar]

On the space of equal-norm frames, consider the initial value problem

\(\Gamma(F_0,0) = F_0, \qquad \frac{d}{dt}\Gamma(F_0,t) = -\operatorname{grad}\operatorname{FP}(\Gamma(F_0,t))\).

If \(F_0\) has full spark, then \(\lim_{t \to \infty} \Gamma(F_0,t)\) is an ENP frame.

Theorem [with Needham]

Same for fusion frames.

Why Not the Other Way?

Definition [cf. Bodmann–Casazza]

The normalizing potential is

\(\operatorname{NP}(f_1,\ldots,f_n) = \sum_{i=1}^n \|f_i\|^4.\)

Proposition [Bodmann–Haas]

The ENP frames are exactly the global minima of \(\operatorname{NP}|_{\text{Parseval}}\).

Theorem [with Caine and Needham]

On the space of Parseval frames, consider the initial value problem

\(\widetilde{\Gamma}(F_0,0) = F_0 \qquad \frac{d}{dt}\widetilde{\Gamma}(F_0,t) = -\operatorname{grad} \operatorname{NP}(\widetilde{\Gamma}(F_0,t))\).

If \(F_0\) is full spark, then \(\lim_{t \to \infty} \widetilde{\Gamma}(F_0,t)\) is an ENP frame.

Normal Matrices

Definition.

\(A \in \mathbb{C}^{d \times d}\) is normal if \(AA^\ast = A^\ast A\).

Equivalently,

\(0 = AA^\ast - A^\ast A = [A,A^\ast]\).

Define the non-normal energy \(\operatorname{E}:\mathbb{C}^{d \times d} \to \mathbb{R}\) by

\(\operatorname{E}(A) := \|[A,A^\ast]\|^2.\)

Obvious Fact.

The normal matrices are the global minima of \(\operatorname{E}\).

Theorem [with Needham]

The only critical points of \(\operatorname{E}\) are the global minima; i.e., the normal matrices.

Normal Matrices

\(\operatorname{E}\) is not quasiconvex!

Theorem [with Needham]

The only critical points of \(\operatorname{E}\) are the global minima; i.e., the normal matrices.

Gradient Descent

Let \(\mathcal{F}: \mathbb{C}^{d \times d} \times \mathbb{R} \to \mathbb{C}^{d \times d}\) be negative gradient descent of \(\operatorname{E}\); i.e.,

\(\mathcal{F}(A_0,0) = A_0 \qquad \frac{d}{dt}\mathcal{F}(A_0,t) = -\nabla \operatorname{E}(\mathcal{F}(A_0,t))\).

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathcal{F}(A_0,t)\) exists, is normal, has the same eigenvalues as \(A_0\), and is real if \(A_0\) is.

Balancing Graphs

Define the unbalanced energy \(\operatorname{B}(A) := \|\mathrm{diag}([A,A^\ast])\|^2 = \sum \left(\|A_i\|^2 - \|A^i\|^2\right)^2\).

If \(A = \left(a_{ij}\right)_{i,j} \in \mathbb{R}^{d \times d}\) such that \(\mathrm{diag}([A,A^\ast]) = 0\), then \(\widehat{A} = \left(a_{ij}^2\right)_{i,j}\) is the adjacency matrix of a balanced multigraph.

Balancing Graphs

Let \(\mathscr{F}(A_0,0) = A_0, \frac{d}{dt}\mathscr{F}(A_0,t) = - \nabla \operatorname{B}(\mathscr{F}(A_0,t))\) be negative gradient flow of \(\operatorname{B}\).

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries wherever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Balancing Graphs

Theorem (with Needham)

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Balancing Graphs

Theorem (with Needham)

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Balancing Graphs

Theorem (with Needham)

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Balancing Graphs

Theorem (with Needham)

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Why?

Symplectic Geometry

A symplectic manifold is a smooth manifold \(M\) together with a closed, non-degenerate 2-form \(\omega \in \Omega^2(M)\).

Example: \((\mathbb{R}^2,dx \wedge dy) = (\mathbb{C},\frac{i}{2}dz \wedge d\bar{z})\)

dx \wedge dy \left( \textcolor{12a4b6}{a \frac{\partial}{\partial x} + b \frac{\partial}{\partial y}}, \textcolor{d9782d}{c \frac{\partial }{\partial x} + d \frac{\partial}{\partial y}} \right) = ad - bc
(a,b) = a \vec{e}_1 + b \vec{e}_2 = a \frac{\partial}{\partial x} + b \frac{\partial}{\partial y}
(c,d) = c \vec{e}_1 + d \vec{e}_2 = c \frac{\partial}{\partial x} + d \frac{\partial}{\partial y}

Examples

\((S^2,d\theta\wedge dz)\)

\((\mathbb{R}^2,dx \wedge dy) = (\mathbb{C},\frac{i}{2}dz \wedge d\bar{z})\)

\((S^2,\omega)\), where \(\omega_p(u,v) = (u \times v) \cdot p\)

\((\mathbb{C}^n, \frac{i}{2} \sum dz_k \wedge d\overline{z}_k)\)

\((\mathbb{C}^{m \times n}, \omega)\) with \(\omega(X_1,X_2) = -\operatorname{Im} \operatorname{trace}(X_1^* X_2)\).

Functions and Symplectic Gradients

If \(H: M \to \mathbb{R}\) is smooth, then there exists a unique vector field \(X_H\) so that \({dH = \iota_{X_H}\omega}\), i.e.,

dH(\cdot) = \omega(X_H, \cdot)

(\(X_H\) is called the Hamiltonian vector field for \(H\), or sometimes the symplectic gradient of \(H\))

Example. \(H: (S^2, d\theta\wedge dz) \to \mathbb{R}\) given by \(H(\theta,z) = z\).

\(dH = dz = \iota_{\frac{\partial}{\partial \theta}}(d\theta\wedge dz)\), so \(X_H = \frac{\partial}{\partial \theta}\).

\(H\) is constant on orbits of \(X_H\):

\(\mathcal{L}_{X_H}(H) = dH(X_H)=\omega(X_H,X_H) = 0\)

Noether’s Theorem

“Every continuous symmetry has a corresponding conserved quantity”

Circle Actions

A circle action on \((M,\omega)\) determines a vector field \(X\) by

X(p) = \left.\frac{d}{dt}\right|_{t=0}e^{i t} \cdot p

\(S^1=U(1)\) acts on \((S^2,d\theta \wedge dz)\) by

e^{it} \cdot(\theta, z) = (\theta + t, z).

So \(X = \frac{\partial}{\partial \theta}\).

Symmetries and Conserved Quantities

Definition. A circle action on \((M,\omega)\) is Hamiltonian if there exists a momentum map

\mu: M \to \mathbb{R}

so that \(d\mu = \iota_{X}\omega = \omega(X,\cdot)\), where \(X\) is the vector field generated by the circle action. In other words, \(X = X_\mu\).

\(X = \frac{\partial}{\partial \theta}\)

\(\mu(\theta,z) = z\)

\(\iota_X\omega = \iota_{\frac{\partial}{\partial \theta}} d\theta \wedge dz = (d\theta \wedge dz)\left(\frac{\partial}{\partial \theta},\cdot \right) = dz \)

Nice Potentials

Suppose \(\mu: (M,\omega) \to \mathfrak{g}^\ast\) is the momentum map of a Hamiltonian \(G\) action.

Define \(\Phi: M \to \mathbb{R}\) by \(\Phi(p) = \|\mu(p)\|^2\).

Frances Kirwan

Theorem [Kirwan]

Reductive algebraic group action on Kähler manifold \(\Longrightarrow\) semistable points flow to global minima of \(\Phi\) by gradient descent.

This kind of function is really nice!

Geometric Invariant Theory (GIT)

The GIT quotient consists of group orbits which can be distinguished by \(G\)-invariant (homogeneous) polynomials.

\(\mathbb{C}^* \curvearrowright \mathbb{CP}^2\)

\(t \cdot [z_0:z_1:z_2] = [z_0: tz_1:\frac{1}{t}z_2]\)

Roughly: identify orbits whose closures intersect, throw away orbits on which all \(G\)-invariant polynomials vanish.

\( \mathbb{CP}^2/\!/\,\mathbb{C}^* \cong\mathbb{CP}^1\)

Groups, Actions, Maps

Questions

Do similar techniques work for

  1. Tightening (or normalizing) probabilistic frames?
  2. Constructing doubly-stochastic matrices?

What other nice configurations are minima of potentials of this form?

Does this machinery tell us anything about the Paulsen problem?

Thank you!

References

Fusion frame homotopy and tightening fusion frames by gradient descent

Tom Needham and Clayton Shonkwiler

Journal of Fourier Analysis and Applications 29 (2023), no. 4, 51

arXiv:2208.11045

Three proofs of the Benedetto–Fickus theorem

Dustin Mixon, Tom Needham, Clayton Shonkwiler, and Soledad Villar

Sampling, Approximation, and Signal Analysis (Harmonic Analysis in the Spirit of J. Rowland Higgins), Stephen D. Casey, M. Maurice Dodson, Paulo J. S. G. Ferreira and Ahmed Zayed, eds., Birkhäuser, Cham, 2023, 371–391

arXiv:2112.02916

Optimization Informed by Geometric Invariant Theory and Symplectic Geometry

By Clayton Shonkwiler

Optimization Informed by Geometric Invariant Theory and Symplectic Geometry

  • 118