Optimization and Special Matrices

Clayton Shonkwiler

Colorado State University

shonkwiler.org

/codex26

this talk!

CodEx

February 10, 2026

Joint Work With:

Tom Needham

Florida State University

Funding

National Science Foundation (DMS–2107700)

Anthony Caine

Arizona State University

Take-Home Messages

  1. Symmetry + geometry sometimes tells you an optimization problem is easier than expected.
  2. Optimization can tell us something about topology.

Some Special Matrices

  1. Self-adjoint
  2. Unitary
  3. Normal (equivalently: orthogonally diagonalizable)
  4. Self-adjoint with given spectrum (e.g., Gram matrices of Parseval frames)
  5. Fixed singular values (e.g., Parseval frames)
  6. Fixed row/column norms (e.g., equal-norm frames)

Can these be realized as minima of some (nice) potential?

Normal Matrices

Definition.

\(A \in \mathbb{C}^{d \times d}\) is normal if \(AA^\ast = A^\ast A\).

Equivalently,

\(0 = AA^\ast - A^\ast A = [A,A^\ast]\).

Define the non-normal energy \(\operatorname{E}:\mathbb{C}^{d \times d} \to \mathbb{R}\) by

\(\operatorname{E}(A) := \|[A,A^\ast]\|^2.\)

Obvious Fact.

The normal matrices are the global minima of \(\operatorname{E}\).

Theorem [with Needham]

The only critical points of \(\operatorname{E}\) are the global minima; i.e., the normal matrices.

Normal Matrices

\(\operatorname{E}\) is not quasiconvex!

Theorem [with Needham]

The only critical points of \(\operatorname{E}\) are the global minima; i.e., the normal matrices.

Gradient Descent

Let \(\mathcal{F}: \mathbb{C}^{d \times d} \times \mathbb{R} \to \mathbb{C}^{d \times d}\) be negative gradient descent of \(\operatorname{E}\); i.e.,

\(\mathcal{F}(A_0,0) = A_0 \qquad \frac{d}{dt}\mathcal{F}(A_0,t) = -\nabla \operatorname{E}(\mathcal{F}(A_0,t))\).

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathcal{F}(A_0,t)\) exists, is normal, has the same eigenvalues as \(A_0\), and is real if \(A_0\) is.

Why?

\(\mathbb{C}^{d \times d}\) is symplectic, with symplectic form \(\omega_A(X,Y) = -\mathrm{Im}\langle X,Y \rangle = -\mathrm{Im}\mathrm{Tr}(Y^\ast X)\).

A symplectic manifold is a smooth manifold \(M\) together with a closed, non-degenerate 2-form \(\omega \in \Omega^2(M)\).

Example: \((\mathbb{R}^2,dx \wedge dy) = (\mathbb{C},\frac{i}{2}dz \wedge d\bar{z})\)

dx \wedge dy \left( \textcolor{12a4b6}{a \frac{\partial}{\partial x} + b \frac{\partial}{\partial y}}, \textcolor{d9782d}{c \frac{\partial }{\partial x} + d \frac{\partial}{\partial y}} \right) = ad - bc
(a,b) = a \vec{e}_1 + b \vec{e}_2 = a \frac{\partial}{\partial x} + b \frac{\partial}{\partial y}
(c,d) = c \vec{e}_1 + d \vec{e}_2 = c \frac{\partial}{\partial x} + d \frac{\partial}{\partial y}

Why?

\(\mathbb{C}^{d \times d}\) is symplectic, with symplectic form \(\omega_A(X,Y) = -\mathrm{Im}\langle X,Y \rangle = -\mathrm{Im}\mathrm{Tr}(Y^\ast X)\).

Consider the conjugation action of \(\operatorname{SU}(d)\) on \(\mathbb{C}^{d \times d}\): \(U \cdot A  = U A U^\ast\).

This action is Hamiltonian with associated momentum map \(\mu: \mathbb{C}^{d \times d} \to \mathscr{H}_0(d)\) given by

\(\mu(A) := [A,A^\ast]\).

So \(\operatorname{E}(A) = \|\mu(A)\|^2\).

Frances Kirwan

This kind of function is really nice!

Geometric Invariant Theory (GIT)

The GIT quotient consists of group orbits which can be distinguished by \(G\)-invariant (homogeneous) polynomials.

\(\mathbb{C}^* \curvearrowright \mathbb{CP}^2\)

\(t \cdot [z_0:z_1:z_2] = [z_0: tz_1:\frac{1}{t}z_2]\)

Roughly: identify orbits whose closures intersect, throw away orbits on which all \(G\)-invariant polynomials vanish.

\( \mathbb{CP}^2/\!/\,\mathbb{C}^* \cong\mathbb{CP}^1\)

Abelian Version

Let \(T \simeq \operatorname{U}(1)^{d-1}\) be the diagonal subgroup of \(\operatorname{SU}(d)\). The conjugation action of \(T\) on \(\mathbb{C}^{d \times d}\) is also Hamiltonian, with momentum map

\(A \mapsto \mathrm{diag}([A,A^\ast])\).

\([A,A^\ast]_{ii} = \|A_i\|^2 - \|A^i\|^2\), where \(A_i\) is the \(i\)th row of \(A\) and \(A^i\) is the \(i\)th column.

If \(A = \left(a_{ij}\right)_{i,j} \in \mathbb{R}^{d \times d}\) such that \(\mathrm{diag}([A,A^\ast]) = 0\), then \(\widehat{A} = \left(a_{ij}^2\right)_{i,j}\) is the adjacency matrix of a balanced multigraph.

Balancing Graphs

Define the unbalanced energy \(\operatorname{B}(A) := \|\mathrm{diag}([A,A^\ast])\|^2 = \sum \left(\|A_i\|^2 - \|A^i\|^2\right)^2\).

Let \(\mathscr{F}(A_0,0) = A_0, \frac{d}{dt}\mathscr{F}(A_0,t) = - \nabla \operatorname{B}(\mathscr{F}(A_0,t))\) be negative gradient flow of \(\operatorname{B}\).

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

This is “local”: \(a_{ij}\) is updated by a multiple of \((\|A_j\|^2-\|A^j\|^2)-(\|A_i\|^2-\|A^i\|^2)\).

Balancing Graphs

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Balancing Graphs

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Balancing Graphs

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

\(\|A\|^2=1\)

\(\|A\|^2=0.569\)

Balancing Graphs

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Balancing Graphs

Theorem [with Needham]

For any \(A_0 \in \mathbb{C}^{d \times d}\), the matrix \(A_\infty := \lim_{t \to \infty} \mathscr{F}(A_0,t)\) exists, is balanced, has the same eigenvalues and principal minors as \(A_0\), and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Preserving Weights

By doing gradient flow \(\overline{\mathscr{F}}\) on the unit sphere, we can preserve weights:

Theorem [with Needham]

For any non-nilpotent \(A_0 \in \mathbb{C}^{d \times d}\) with \(\|A\|^2=1\), the matrix \(A_\infty := \lim_{t \to \infty} \overline{\mathscr{F}}(A_0,t)\) exists, is balanced, has Frobenius norm 1, and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Preserving Weights

By doing gradient flow \(\overline{\mathscr{F}}\) on the unit sphere, we can preserve weights:

Theorem [with Needham]

For any non-nilpotent \(A_0 \in \mathbb{C}^{d \times d}\) with \(\|A\|^2=1\), the matrix \(A_\infty := \lim_{t \to \infty} \overline{\mathscr{F}}(A_0,t)\) exists, is balanced, has Frobenius norm 1, and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Preserving Weights

By doing gradient flow \(\overline{\mathscr{F}}\) on the unit sphere, we can preserve weights:

Theorem [with Needham]

For any non-nilpotent \(A_0 \in \mathbb{C}^{d \times d}\) with \(\|A\|^2=1\), the matrix \(A_\infty := \lim_{t \to \infty} \overline{\mathscr{F}}(A_0,t)\) exists, is balanced, has Frobenius norm 1, and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Preserving Weights

By doing gradient flow \(\overline{\mathscr{F}}\) on the unit sphere, we can preserve weights:

Theorem [with Needham]

For any non-nilpotent \(A_0 \in \mathbb{C}^{d \times d}\) with \(\|A\|^2=1\), the matrix \(A_\infty := \lim_{t \to \infty} \overline{\mathscr{F}}(A_0,t)\) exists, is balanced, has Frobenius norm 1, and has zero entries whenever \(A_0\) does.

If \(A_0\) is real, so is \(A_\infty\), and if \(A_0\) has all non-negative entries, then so does \(A_\infty\).

Applications?

Equal-Norm Parseval Frames

A spanning set \(f_1, \dots , f_n \in \mathbb{C}^d\) is a frame.

\(\Rightarrow F = [f_1 \cdots f_n] \in \mathbb{C}^{d \times n}\)

Definition.

\(\{f_1,\dots, f_n\}\subset \mathbb{C}^d\) is a Parseval frame if \(\operatorname{Id}_{d\times d}=FF^*=f_1f_1^*+\dots+f_nf_n^*\).

An equal-norm Parseval frame (ENP frame) is a Parseval frame \(f_1,\dots , f_n\) with \(\|f_i\|^2=\|f_j\|^2\) for all \(i\) and \(j\).

\(\sum \|f_i\|^2=\operatorname{tr}F^*F=\operatorname{tr}FF^*=\operatorname{tr}\operatorname{Id}_{d \times d} = d\), so each \(\|f_i\|^2=\frac{d}{n}\).

Frame Potential

Definition [Benedetto–Fickus, Casazza–Fickus]

The frame potential is

\(\operatorname{FP}(F) = \|FF^\ast\|_{\operatorname{Fr}}^2\)

Proposition [cf. Welch]

The equal-norm Parseval frames are exactly the global minima of \(\operatorname{FP}|_{\text{equal norm}}\).

Theorem [Benedetto–Fickus]

As a function on equal-norm frames with fixed \(d\) and \(n\), \(\operatorname{FP}\) has no spurious local minima.

Frame Potential

Optimization

Theorem [with Mixon, Needham, and Villar; FFT 2021 video]

On the space of equal-norm frames, consider the initial value problem

\(\Gamma(F_0,0) = F_0, \qquad \frac{d}{dt}\Gamma(F_0,t) = -\operatorname{grad}\operatorname{FP}(\Gamma(F_0,t))\).

If \(F_0\) has full spark, then \(\lim_{t \to \infty} \Gamma(F_0,t)\) is an ENP frame.

Theorem [with Needham; CodEx 2022 video]

Same for fusion frames.

Frame Energy

Definition [Bodmann–Casazza]

The frame energy is

\(\operatorname{FE}(F) = \sum_{j,k} \left( \|f_j\|^2 - \|f_k\|^2\right)^2 = 2n \sum_j \|f_j\|^4 -2d^2\)

Proposition [Bodmann–Haas]

The equal-norm Parseval frames are exactly the global minima of \(\operatorname{FE}|_{\text{Parseval}}\).

Theorem [Caine]

On the space of Parseval frames, consider the initial value problem

\(\widetilde{\Gamma}(F_0,0) = F_0 \qquad \frac{d}{dt}\widetilde{\Gamma}(F_0,t) = -\operatorname{grad} \operatorname{FE}(\widetilde{\Gamma}(F_0,t))\).

If \(F_0\) is full spark, then \(\lim_{t \to \infty} \widetilde{\Gamma}(F_0,t)\) is an ENP frame.

Frame potential: \(\operatorname{FP}(F) = \|FF^\ast\|_{\operatorname{Fr}}^2\)

Frame energy: \(\operatorname{FE}(F) = 2n \sum_j \|f_j\|^4 -2d^2\)

Frame energy: \(\operatorname{FE}(F) = \sum_j \|f_j\|^4\)

Total Frame Energy

Definition.

For \(\vec{r} \in \mathbb{R}_+^n\), the total frame energy of a frame is

\(E_{\vec{r}}(F) :=\|FF^* - \mathbb{I}_d \|_{\text{Fr}}^2 + \frac{1}{4}\sum_j \left(\frac{\|f_j\|^2}{r_j}-1 \right)^2\)

Proposition.

The Parseval frames with \(\|f_j\|^2 = r_j\) for \(j=1,\dots , n\) are exactly the global minima of \(E_{\vec{r}}\).

Variations on a Theme

Theorem [with Caine and Needham]

Let \(\vec{r} \in \mathbb{Q}_+^n\) such that \(\operatorname{PF}_d^{\mathbb{K}}(\vec{r}) \neq \emptyset\). Consider

\(\Gamma_{\vec{r}}(F_0,0) = F_0,\quad \frac{d}{dt} \Gamma_{\vec{r}}(F_0,t) = -\nabla E_{\vec{r}}(\Gamma_{\vec{r}}(F_0,t))\).

If \(F_0 \in \mathbb{K}^{d \times n}\) is full spark, then \(\lim_{t \to \infty} \Gamma_{\vec{r}}(F_0,t)\) is in \(\operatorname{PF}_d^{\mathbb{K}}(\vec{r})\).

Definition.

For \(\mathbb{K} \in \{\mathbb{R}, \mathbb{C}\}\) and \(\vec{r} \in \mathbb{R}_+^n\), let \(\operatorname{PF}^{\mathbb{K}}_d(\vec{r})\) be the space of Parseval frames \(F \in \mathbb{K}^{d \times n}\) with \(\|f_i\|^2 = r_i\).

\(\vec{r}\) is admissible

Variations on a Theme

Theorem [with Caine and Needham]

Let \(\vec{r} \in \mathbb{Q}_+^n\) such that \(\operatorname{PF}_d^{\mathbb{K}}(\vec{r}) \neq \emptyset\). Consider

\(\Gamma_{\vec{r}}(F_0,0) = F_0,\quad \frac{d}{dt} \Gamma_{\vec{r}}(F_0,t) = -\nabla E_{\vec{r}}(\Gamma_{\vec{r}}(F_0,t))\).

If \(F_0 \in \mathbb{K}^{d \times n}\) is full spark, then \(\lim_{t \to \infty} \Gamma_{\vec{r}}(F_0,t)\) is in \(\operatorname{PF}_d^{\mathbb{K}}(\vec{r})\).

Topology of Frame Spaces

Frame Homotopy Conjecture [Larson 2002]

The space of ENP frames is path-connected.

Theorem [Cahill, Mixon, Strawn 2017]

The Frame Homotopy Conjecture is true.

Theorem [with Needham 2022; CodEx 2021 video]

\(\operatorname{PF}_d^{\mathbb{H}}(\vec{r})\) is path-connected for all admissible \(\vec{r} \in \mathbb{R}_+^n\).

Theorem [with Needham 2021; CodEx 2021 video]

\(\operatorname{PF}_d^{\mathbb{C}}(\vec{r})\) is path-connected for all admissible \(\vec{r}\in \mathbb{R}_+^n\).

Theorem [Mare 2024]

\(\operatorname{PF}_d^{\mathbb{R}}(\vec{r})\) is path-connected for certain \(\vec{r}\) with entries repeating in special patterns.

Theorem [Mare 2025]

\(\operatorname{PF}_d^{\mathbb{C}}(\vec{r})\) is simply-connected for admissible \(\vec{r} \in \mathbb{R}_+^n\) so that all \(r_{i_1} + \dots + r_{i_{n-k}} \geq 1\).

Topology from Optimization

Meta-Theorem

Gradient flow of \(E_{\vec{r}}\) gives a deformation retract of \(\mathbb{K}^{d \times n} \backslash \mathcal{U}_{\vec{r}}\) onto \(\operatorname{PF}_d^{\mathbb{K}}(\vec{r})\).

Corollary [with Caine and Needham]

Let \(q \in \mathbb{Z}_{\geq 0}\) and 

\(d \geq \begin{cases} q+2 & \text{if } \mathbb{K}= \mathbb{R}\\ \frac{q+2}{2} & \text{if } \mathbb{K} = \mathbb{C}.\end{cases}\)

Then the space of ENP frames is \(q\)-connected for all \(n \geq \frac{d}{d-1}(d+q+1)\).

Corollary [with Caine and Needham]

Let \(d \geq 2\). Then there exists \(\epsilon > 0\) so that, for any admissible \(\vec{r} \in \mathbb{Q}_+^n\) with \(\left|r_i - \frac{d}{n}\right| < \epsilon\) for all \(i\), the space \(\operatorname{PF}_d^{\mathbb{R}}(\vec{r})\) is path-connected.

Questions

Can we characterize exactly when \(\operatorname{PF}_d^{\mathbb{R}}(\vec{r})\) is path-connected?

Theorem [Kapovich–Millson 1995]

\(\operatorname{PF}_2^{\mathbb{R}}(\vec{r})/(SO(2) \times O(1)^n)\) is disconnected if and only if there exist \(i,j,k\) so that \(r_i + r_j > 1\), \(r_j + r_k > 1\), and \(r_k + r_i > 1\). (Note: if \(\vec{r}\) is admissible, \(r_1 + \dots + r_n = 2\).)

What about other frame operators?

Can these methods be made numerically tractable?

What happens when \(\vec{r}\) is not admissible?

What other loss functions of interest fit this framework?

Thank you!

shonkwiler.org/codex26

Reference

Geometric approaches to matrix normalization and graph balancing

Tom Needham and Clayton Shonkwiler

Forum of Mathematics, Sigma 13 (2025), e149