Optimization and Special Matrices

Clayton Shonkwiler

Colorado State University

shonkwiler.org

/codex26

this talk!

CodEx

February 10, 2026

Joint Work With:

Tom Needham

Florida State University

Funding

National Science Foundation (DMS–2107700)

Anthony Caine

Arizona State University

Take-Home Messages

  1. Symmetry + geometry sometimes tells you an optimization problem is easier than expected.
  2. Optimization can tell us something about topology.

Some Special Matrices

  1. Self-adjoint
  2. Unitary
  3. Normal (equivalently: orthogonally diagonalizable)
  4. Self-adjoint with given spectrum (e.g., Gram matrices of Parseval frames)
  5. Fixed singular values (e.g., Parseval frames)
  6. Fixed row/column norms (e.g., equal-norm frames)

Can these be realized as minima of some (nice) potential?

Normal Matrices

Definition.

\(A \in \mathbb{C}^{d \times d}\) is normal if \(AA^\ast = A^\ast A\).

Equivalently,

\(0 = AA^\ast - A^\ast A = [A,A^\ast]\).

Define the non-normal energy \(\operatorname{E}:\mathbb{C}^{d \times d} \to \mathbb{R}\) by

\(\operatorname{E}(A) := \|[A,A^\ast]\|^2.\)

Obvious Fact.

The normal matrices are the global minima of \(\operatorname{E}\).

Theorem [with Needham]

The only critical points of \(\operatorname{E}\) are the global minima; i.e., the normal matrices.

Normal Matrices

\(\operatorname{E}\) is not quasiconvex!

Theorem [with Needham]

The only critical points of \(\operatorname{E}\) are the global minima; i.e., the normal matrices.

Gradient Descent

Let \(\mathcal{F}: \mathbb{C}^{d \times d} \times \mathbb{R} \to \mathbb{C}^{d \times d}\) be negative gradient descent of \(\operatorname{E}\); i.e.,

\(\mathcal{F}(A_0,0) = A_0 \qquad \frac{d}{dt}\mathcal{F}(A_0,t) = -\nabla \operatorname{E}(\mathcal{F}(A_0,t))\).

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathcal{F}(A_0,t)\) exists, is normal, has the same eigenvalues as \(A_0\), and is real if \(A_0\) is.

Why?

\(\mathbb{C}^{d \times d}\) is symplectic, with symplectic form \(\omega_A(X,Y) = -\mathrm{Im}\langle X,Y \rangle = -\mathrm{Im}\mathrm{Tr}(Y^\ast X)\).

A symplectic manifold is a smooth manifold \(M\) together with a closed, non-degenerate 2-form \(\omega \in \Omega^2(M)\).

Example: \((\mathbb{R}^2,dx \wedge dy) = (\mathbb{C},\frac{i}{2}dz \wedge d\bar{z})\)

dx \wedge dy \left( \textcolor{12a4b6}{a \frac{\partial}{\partial x} + b \frac{\partial}{\partial y}}, \textcolor{d9782d}{c \frac{\partial }{\partial x} + d \frac{\partial}{\partial y}} \right) = ad - bc
(a,b) = a \vec{e}_1 + b \vec{e}_2 = a \frac{\partial}{\partial x} + b \frac{\partial}{\partial y}
(c,d) = c \vec{e}_1 + d \vec{e}_2 = c \frac{\partial}{\partial x} + d \frac{\partial}{\partial y}

Why?

\(\mathbb{C}^{d \times d}\) is symplectic, with symplectic form \(\omega_A(X,Y) = -\mathrm{Im}\langle X,Y \rangle = -\mathrm{Im}\mathrm{Tr}(Y^\ast X)\).

Consider the conjugation action of \(\operatorname{SU}(d)\) on \(\mathbb{C}^{d \times d}\): \(U \cdot A  = U A U^\ast\).

This action is Hamiltonian with associated momentum map \(\mu: \mathbb{C}^{d \times d} \to \mathscr{H}_0(d)\) given by

\(\mu(A) := [A,A^\ast]\).

So \(\operatorname{E}(A) = \|\mu(A)\|^2\).

Frances Kirwan

This kind of function is really nice!

Geometric Invariant Theory (GIT)

The GIT quotient consists of group orbits which can be distinguished by \(G\)-invariant (homogeneous) polynomials.

\(\mathbb{C}^* \curvearrowright \mathbb{CP}^2\)

\(t \cdot [z_0:z_1:z_2] = [z_0: tz_1:\frac{1}{t}z_2]\)

Roughly: identify orbits whose closures intersect, throw away orbits on which all \(G\)-invariant polynomials vanish.

\( \mathbb{CP}^2/\!/\,\mathbb{C}^* \cong\mathbb{CP}^1\)

Abelian Version

Let \(T \simeq \operatorname{U}(1)^{d-1}\) be the diagonal subgroup of \(\operatorname{SU}(d)\). The conjugation action of \(T\) on \(\mathbb{C}^{d \times d}\) is also Hamiltonian, with momentum map

\(A \mapsto \mathrm{diag}([A,A^\ast])\).

\([A,A^\ast]_{ii} = \|A_i\|^2 - \|A^i\|^2\), where \(A_i\) is the \(i\)th row of \(A\) and \(A^i\) is the \(i\)th column.

If \(A = \left(a_{ij}\right)_{i,j} \in \mathbb{R}^{d \times d}\) such that \(\mathrm{diag}([A,A^\ast]) = 0\), then \(\widehat{A} = \left(a_{ij}^2\right)_{i,j}\) is the adjacency matrix of a balanced multigraph.

Balancing Graphs

Define the unbalanced energy \(\operatorname{B}(A) := \|\mathrm{diag}([A,A^\ast])\|^2 = \sum \left(\|A_i\|^2 - \|A^i\|^2\right)^2\).

Let \(\mathscr{F}(A_0,0) = A_0, \frac{d}{dt}\mathscr{F}(A_0,t) = - \nabla \operatorname{B}(\mathscr{F}(A_0,t))\) be negative gradient flow of \(\operatorname{B}\).

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

This is “local”: \(a_{ij}\) is updated by a multiple of \((\|A_j\|^2-\|A^j\|^2)-(\|A_i\|^2-\|A^i\|^2)\).

Balancing Graphs

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Balancing Graphs

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Balancing Graphs

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

\(\|A\|^2=1\)

\(\|A\|^2=0.569\)

Balancing Graphs

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Balancing Graphs

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Preserving Weights

By doing gradient flow \(\overline{\mathscr{F}}\) on the unit sphere, we can preserve weights:

Theorem [with Needham]

For any non-nilpotent \(A_0 \in \mathbb{C}^{d \times d}\) with \(\|A\|^2=1\), the matrix \(A_\infty := \lim_{t \to \infty} \overline{\mathscr{F}}(A_0,t)\) exists, is balanced, has Frobenius norm 1, and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Preserving Weights

By doing gradient flow \(\overline{\mathscr{F}}\) on the unit sphere, we can preserve weights:

Theorem [with Needham]

For any non-nilpotent \(A_0 \in \mathbb{C}^{d \times d}\) with \(\|A\|^2=1\), the matrix \(A_\infty := \lim_{t \to \infty} \overline{\mathscr{F}}(A_0,t)\) exists, is balanced, has Frobenius norm 1, and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Preserving Weights

By doing gradient flow \(\overline{\mathscr{F}}\) on the unit sphere, we can preserve weights:

Theorem [with Needham]

For any non-nilpotent \(A_0 \in \mathbb{C}^{d \times d}\) with \(\|A\|^2=1\), the matrix \(A_\infty := \lim_{t \to \infty} \overline{\mathscr{F}}(A_0,t)\) exists, is balanced, has Frobenius norm 1, and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Preserving Weights

By doing gradient flow \(\overline{\mathscr{F}}\) on the unit sphere, we can preserve weights:

Theorem [with Needham]

For any non-nilpotent \(A_0 \in \mathbb{C}^{d \times d}\) with \(\|A\|^2=1\), the matrix \(A_\infty := \lim_{t \to \infty} \overline{\mathscr{F}}(A_0,t)\) exists, is balanced, has Frobenius norm 1, and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Applications?

Equal-Norm Parseval Frames

A spanning set \(f_1, \dots , f_n \in \mathbb{C}^d\) is a frame.

\(\Rightarrow F = [f_1 \cdots f_n] \in \mathbb{C}^{d \times n}\)

Definition.

\(\{f_1,\dots, f_n\}\subset \mathbb{C}^d\) is a Parseval frame if \(\operatorname{Id}_{d\times d}=FF^*=f_1f_1^*+\dots+f_nf_n^*\).

An equal-norm Parseval frame (ENP frame) is a Parseval frame \(f_1,\dots , f_n\) with \(\|f_i\|^2=\|f_j\|^2\) for all \(i\) and \(j\).

\(\sum \|f_i\|^2=\operatorname{tr}F^*F=\operatorname{tr}FF^*=\operatorname{tr}\operatorname{Id}_{d \times d} = d\), so each \(\|f_i\|^2=\frac{d}{n}\).

Frame Potential

Definition [Benedetto–Fickus, Casazza–Fickus]

The frame potential is

\(\operatorname{FP}(F) = \|FF^\ast\|_{\operatorname{Fr}}^2\)

Proposition [cf. Welch]

The equal-norm Parseval frames are exactly the global minima of \(\operatorname{FP}|_{\text{equal norm}}\).

Theorem [Benedetto–Fickus]

As a function on equal-norm frames with fixed \(d\) and \(n\), \(\operatorname{FP}\) has no spurious local minima.

Frame Potential

Optimization

Theorem [with Mixon, Needham, and Villar; FFT 2021 video]

On the space of equal-norm frames, consider the initial value problem

\(\Gamma(F_0,0) = F_0, \qquad \frac{d}{dt}\Gamma(F_0,t) = -\operatorname{grad}\operatorname{FP}(\Gamma(F_0,t))\).

If \(F_0\) has full spark, then \(\lim_{t \to \infty} \Gamma(F_0,t)\) is an ENP frame.

Theorem [with Needham; CodEx 2022 video]

Same for fusion frames.

Frame Energy

Definition [Bodmann–Casazza]

The frame energy is

\(\operatorname{FE}(F) = \sum_{j,k} \left( \|f_j\|^2 - \|f_k\|^2\right)^2 = 2n \sum_j \|f_j\|^4 -2d^2\)

Proposition [Bodmann–Haas]

The equal-norm Parseval frames are exactly the global minima of \(\operatorname{FE}|_{\text{Parseval}}\).

Theorem [Caine]

On the space of Parseval frames, consider the initial value problem

\(\widetilde{\Gamma}(F_0,0) = F_0 \qquad \frac{d}{dt}\widetilde{\Gamma}(F_0,t) = -\operatorname{grad} \operatorname{FE}(\widetilde{\Gamma}(F_0,t))\).

If \(F_0\) is full spark, then \(\lim_{t \to \infty} \widetilde{\Gamma}(F_0,t)\) is an ENP frame.

Frame potential: \(\operatorname{FP}(F) = \|FF^\ast\|_{\operatorname{Fr}}^2\)

Frame energy: \(\operatorname{FE}(F) = 2n \sum_j \|f_j\|^4 -2d^2\)

Frame energy: \(\operatorname{FE}(F) = \sum_j \|f_j\|^4\)

Total Frame Energy

Definition.

For \(\vec{r} \in \mathbb{R}_+^n\), the total frame energy of a frame is

\(E_{\vec{r}}(F) :=\|FF^* - \mathbb{I}_d \|_{\text{Fr}}^2 + \frac{1}{4}\sum_j \left(\frac{\|f_j\|^2}{r_j}-1 \right)^2\)

Proposition.

The Parseval frames with \(\|f_j\|^2 = r_j\) for \(j=1,\dots , n\) are exactly the global minima of \(E_{\vec{r}}\).

Variations on a Theme

Theorem [with Caine and Needham]

Let \(\vec{r} \in \mathbb{Q}_+^n\) such that \(\operatorname{PF}_d^{\mathbb{K}}(\vec{r}) \neq \emptyset\). Consider

\(\Gamma_{\vec{r}}(F_0,0) = F_0,\quad \frac{d}{dt} \Gamma_{\vec{r}}(F_0,t) = -\nabla E_{\vec{r}}(\Gamma_{\vec{r}}(F_0,t))\).

If \(F_0 \in \mathbb{K}^{d \times n}\) is full spark, then \(\lim_{t \to \infty} \Gamma_{\vec{r}}(F_0,t)\) is in \(\operatorname{PF}_d^{\mathbb{K}}(\vec{r})\).

Definition.

For \(\mathbb{K} \in \{\mathbb{R}, \mathbb{C}\}\) and \(\vec{r} \in \mathbb{R}_+^n\), let \(\operatorname{PF}^{\mathbb{K}}_d(\vec{r})\) be the space of Parseval frames \(F \in \mathbb{K}^{d \times n}\) with \(\|f_i\|^2 = r_i\).

\(\vec{r}\) is admissible

Variations on a Theme

Theorem [with Caine and Needham]

Let \(\vec{r} \in \mathbb{Q}_+^n\) such that \(\operatorname{PF}_d^{\mathbb{K}}(\vec{r}) \neq \emptyset\). Consider

\(\Gamma_{\vec{r}}(F_0,0) = F_0,\quad \frac{d}{dt} \Gamma_{\vec{r}}(F_0,t) = -\nabla E_{\vec{r}}(\Gamma_{\vec{r}}(F_0,t))\).

If \(F_0 \in \mathbb{K}^{d \times n}\) is full spark, then \(\lim_{t \to \infty} \Gamma_{\vec{r}}(F_0,t)\) is in \(\operatorname{PF}_d^{\mathbb{K}}(\vec{r})\).

Topology of Frame Spaces

Frame Homotopy Conjecture [Larson 2002]

The space of ENP frames is path-connected.

Theorem [Cahill, Mixon, Strawn 2017]

The Frame Homotopy Conjecture is true.

Theorem [with Needham 2022; CodEx 2021 video]

\(\operatorname{PF}_d^{\mathbb{H}}(\vec{r})\) is path-connected for all admissible \(\vec{r} \in \mathbb{R}_+^n\).

Theorem [with Needham 2021; CodEx 2021 video]

\(\operatorname{PF}_d^{\mathbb{C}}(\vec{r})\) is path-connected for all admissible \(\vec{r}\in \mathbb{R}_+^n\).

Theorem [Mare 2024]

\(\operatorname{PF}_d^{\mathbb{R}}(\vec{r})\) is path-connected for certain \(\vec{r}\) with entries repeating in special patterns.

Theorem [Mare 2025]

\(\operatorname{PF}_d^{\mathbb{C}}(\vec{r})\) is simply-connected for admissible \(\vec{r} \in \mathbb{R}_+^n\) so that all \(r_{i_1} + \dots + r_{i_{n-k}} \geq 1\).

Topology from Optimization

Meta-Theorem

Gradient flow of \(E_{\vec{r}}\) gives a deformation retract of \(\mathbb{K}^{d \times n} \backslash \mathcal{U}_{\vec{r}}\) onto \(\operatorname{PF}_d^{\mathbb{K}}(\vec{r})\).

Corollary [with Caine and Needham]

Let \(q \in \mathbb{Z}_{\geq 0}\) and 

\(d \geq \begin{cases} q+2 & \text{if } \mathbb{K}= \mathbb{R}\\ \frac{q+2}{2} & \text{if } \mathbb{K} = \mathbb{C}.\end{cases}\)

Then the space of ENP frames is \(q\)-connected for all \(n \geq \frac{d}{d-1}(d+q+1)\).

Corollary [with Caine and Needham]

Let \(d \geq 2\). Then there exists \(\epsilon > 0\) so that, for any admissible \(\vec{r} \in \mathbb{Q}_+^n\) with \(\left|r_i - \frac{d}{n}\right| < \epsilon\) for all \(i\), the space \(\operatorname{PF}_d^{\mathbb{R}}(\vec{r})\) is path-connected.

Questions

Can we characterize exactly when \(\operatorname{PF}_d^{\mathbb{R}}(\vec{r})\) is path-connected?

Theorem [Kapovich–Millson 1995]

\(\operatorname{PF}_2^{\mathbb{R}}(\vec{r})/(SO(2) \times O(1)^n)\) is disconnected if and only if there exist \(i,j,k\) so that \(r_i + r_j > 1\), \(r_j + r_k > 1\), and \(r_k + r_i > 1\). (Note: if \(\vec{r}\) is admissible, \(r_1 + \dots + r_n = 2\).)

What about other frame operators?

Can these methods be made numerically tractable?

What happens when \(\vec{r}\) is not admissible?

What other loss functions of interest fit this framework?

Thank you!

shonkwiler.org/codex26

Reference

Geometric approaches to matrix normalization and graph balancing

Tom Needham and Clayton Shonkwiler

Forum of Mathematics, Sigma 13 (2025), e149

Optimization and Special Matrices

By Clayton Shonkwiler

Optimization and Special Matrices

  • 86