A Natural Map from Random Walks to Equilateral Polygons in Any Dimension

Clayton Shonkwiler

Colorado State University

http://shonkwiler.org

11.07.17

/cmo17

This talk!

Statistical physics viewpoint

A polymer in solution takes on an ensemble of random shapes, with topology as the unique conserved quantity.

Modern polymer physics is based on the analogy between a polymer chain and a random walk.

– Alexander Grosberg

Protonated P2VP

Roiter/Minko

Clarkson University

Plasmid DNA

Alonso–Sarduy, Dietler Lab

EPF Lausanne

Sampling random walks in \(\mathbb{R}^d\) is easy

Generate \(n\) independent uniform random points on \(S^{d-1}\) and treat them as an ordered list of edge vectors.

...but sampling random polygons was hard

Alvarado, Calvo, Millett, J. Stat. Phys. 143 (2011), 102–138

Key idea: commuting symmetries

Rotations around \(n-3\) chords \(d_i\) by \(n-3\) angles \(\theta_i\) commute.

A polytope

The \((n-3)\)-dimensional  moment polytope \(\mathcal{P}_n \subset \mathbb{R}^{n-3}\) is defined by the triangle inequalities

0 \leq d_i \leq 2
0di20 \leq d_i \leq 2
1 \leq d_i + d_{i-1}
1di+di11 \leq d_i + d_{i-1}
|d_i - d_{i-1}| \leq 1
didi11|d_i - d_{i-1}| \leq 1
0 \leq d_{n-3} \leq 2
0dn320 \leq d_{n-3} \leq 2

From action-angle coordinates to polygons

There exists an almost-everywhere defined map \(\alpha: \mathcal{P}_n \times (S^1)^{n-3} \to \text{Pol}_3(n)/SO(3)\).

This is only sensible as a map to polygons modulo translation and rotation.

Sampling polygons in \(\mathbb{R}^3\)

Theorem (with Cantarella, 2016)

The map \(\alpha\) pushes forward the standard probability measure on \(\mathcal{P}_n \times (S^1)^{n-3}\) to the correct probability measure on \(\text{Pol}_3(n)/SO(3)\).

Corollary

Independently sampling \(\mathcal{P}_n\) and \((S^1)^{n-3}\) is a perfect sampling algorithm for equilateral \(n\)-gons in \(\mathbb{R}^3\).

Theorem (with Cantarella, Duplantier, Uehara, 2016)

A direct sampler with expected runtime \(\Theta(n^{5/2})\).

Kyle Chapman has an even faster sampler he will talk about on Thursday.

Unfortunately, this is all very special to 3 dimensions...

Strategy

  1. Find a natural map \(g:\text{Arm}_d(n) \to \text{Pol}_d(n)\).
  2. Sample points in \(\text{Arm}_d(n)\) and apply \(g\).
  3. Hope the pushforward measure is almost uniform.

Efficient closure

Idea: \(g\) should map each arm to the closed polygon which is closest to it.

Problem: This is not well-defined on all of \(\text{Arm}_d(n)\).

What is the closest closed polygon?

Relatedly: \(\text{Arm}_d(n) \simeq (S^{d-1})^n\) and \(\text{Pol}_d(n)\) have different topologies, so there's no retraction defined on all of \(\text{Arm}_d(n)\).

The geometric median

Definition

A geometric median (or Fermat-Weber point) of a collection \(X=\{x_1,\ldots , x_n\}\) of points in \(\mathbb{R}^d\) is any point closest to the \(x_i\):

\(\text{gm}(X)=\text{argmin}_y \sum \|x_i-y\|\)

Definition

A point cloud has a nice geometric median if:

  • \(\text{gm}(X)\) is unique (\(\Leftarrow X\) is not linear)
  • \(\text{gm}(X)\) is not one of the \(x_i\)

Geometric median of a triangle

Geometric median of a quadrilateral

Geometric median closure

Definition

If the edge cloud \(X\) of an equilateral arm has a nice geometric median, the geometric median closure \(\text{gmc}(X)\) recenters the edge cloud at the geometric median.

\(\text{gmc}(X)_i = \frac{x_i-\text{gm}(X)}{\|x_i - \text{gm}(X)\|}\)

Closing a 17-edge arm

Loop closure

1QMG – Acetohydroxyacid isomeroreductase

The geometric median closure is closed

Proposition (with Cantarella)

If it exists, the geometric median closure of an arm is a closed polygon.

Proof

\(\text{gm}(X)\) minimizes the total distance function

\(d_X(y) = \sum_i \|x_i - y\|\),

which is convex everywhere and smooth away from the \(x_i\), and

\(\nabla d_X(y) = \sum_i \frac{x_i-y}{\|x_i-y\|}\).

The geometric median closure is optimal

Definition

An arm or polygon \(X\) is given by \(n\) edge vectors \(x_i \in \mathbb{R}^d\), or a single point in \(\mathbb{R}^{dn}\). The distance between \(X\) and \(Y\) is the Euclidean distance between these points in \(\mathbb{R}^{dn}\).

Theorem (with Cantarella)

If \(X\) is an equilateral arm in \(\mathbb{R}^d\) with a geometric median closure, then \(\text{gmc}(X)\) is the closest equilateral polygon to \(X\).

Proof

Depends on the neat fact that if \(\|x_i\|=\|y_i\|\), then

\(\langle X, Y -X \rangle \leq 0\).

gmc can fail, but not often

Theorem* (with Cantarella)

The fraction of \(\text{Arm}_d(n)\) without a geometric median closure \(\to 0\) exponentially fast in \(n\).

Proof strategy

\(\text{Ad}(p) = \sqrt{1+\|p\|^2}\ \Gamma\!\left(\frac{d}{2}\right) {}_2F_1\!\!\left(-\frac{1}{4},\frac{1}{4},\frac{d}{2},\frac{4\|p\|^2}{(1+\|p\|^2)^2}\right)\)

For a random point cloud \(X = (x_1, \ldots , x_n)\) on \(S^{d-1}\), want to show \(\text{gm}(X)\) is close to the origin (and therefore \(\text{gmc}(X)\) exists) with high probability.

Claim: This follows if we can show that \(d_X\) is \(L^\infty\) close to \(d \text{Ad}: B^d \to \mathbb{R}\), where \(\text{Ad}(p)\) is the average distance from \(S^{d-1}\) to \(p \in B^d\).

Lemma (Hjort & Pollard, 1993)

Let \(f\) be convex and \(g\) any function with unique argmin at 0. Let \(B_\delta\) be the ball of radius \(\delta\), \(M = \sup_{s \in B_\delta}|f(s) -g(s)|\), \(m = \inf_{s \in \partial B_\delta}|g(s) - g(0)|\). If \(M < m/2\), then argmin \(f \in B_\delta\).

Concentration of measure

Theorem (Bernstein inequality for Hilbert spaces)

Let \((\Omega,\mathcal{A},P\) be a probability space, \(H\) a separable Hilbert space, \(B > 0, \sigma > 0\). If \(\xi_1,\ldots , \xi_n:\Omega \to H\) are independent r.v.'s satisfying \(\mathbb{E}(\xi_i)=0, \|\xi_i\|_\infty \leq B, \mathbb{E}(\|\xi_i\|_H^2)\leq \sigma^2\), then

\(P\left(\left\|\frac{1}{n}\sum \xi_i\right\|_H \geq \sqrt{\frac{2\sigma^2 \tau}{n}} + \sqrt{\frac{\sigma^2}{n}} + \frac{2B\tau}{3n}\right) \leq e^{-\tau}\)

In our case, let \(\xi_i(p) = \|x_i-p\| - \text{Ad}(p)\) and \(H = L^2(B^d)\). Then, e.g., 

\(P\left(\left\|\frac{1}{n}d_X-\text{ad}\right\|_2 > \frac{30}{n^{1/3}}\right) \leq e^{-n^{1/3}} \)

In other words, \(d_X\) and \(d \text{Ad}\) are close in \(L^2\) with high probability

A reverse bound

Proposition (with Cantarella)

If \(f\) is differentiable on \(B^d\) with \(\|\nabla f\|_\infty \leq K\), then

\|f\|_\infty \leq \frac{2}{\pi^{d/2}\left(\frac{2}{\Gamma(d/2+1)}-\frac{2^dK d \Gamma(d/2)}{(d+1)\sqrt{\pi}\Gamma(d+1/2)}\right)}\|f\|_1.
f2πd/2(2Γ(d/2+1)2dKdΓ(d/2)(d+1)πΓ(d+1/2))f1.\|f\|_\infty \leq \frac{2}{\pi^{d/2}\left(\frac{2}{\Gamma(d/2+1)}-\frac{2^dK d \Gamma(d/2)}{(d+1)\sqrt{\pi}\Gamma(d+1/2)}\right)}\|f\|_1.

Since \(\|f\|_1 \leq \text{Vol}(B^d) \|f\|_2\), we can combine this with the concentration result to see that \(d_X\) is \(L^\infty\) close to the average distance function with high probability, and hence the Hjort & Pollard lemma applies.

Is this a good sampler?

If \(X \in (S^{d-1})^n\) is chosen uniformly, then we've seen that \(\text{gmc}(X)\) exists with very high probability.

This produces some distribution on closed polygons.

Question

What is this distribution? Is it uniform?

We can test when \(d=3\)…

Chordlengths are (supposed to be) uniform

A more subtle test

Histogram of chord lengths from ~ 1 million pentagons created by sampling arms and applying gmc, compared to exact pdf of random pentagons

Exact chord pdfs

Let \(\phi_n(\ell)\) be the density of the end-to-end distance in an \(n\)-step random flight. From Lord Rayleigh,

\phi_n(\ell) = \frac{2\ell}{\pi}\int_0^\infty x \sin \ell x \text{sinc}^n x \text{d}x.
ϕn()=2π0xsinxsincnxdx.\phi_n(\ell) = \frac{2\ell}{\pi}\int_0^\infty x \sin \ell x \text{sinc}^n x \text{d}x.

This is piecewise-polynomial of degree \(n-3\).

Proposition

The pdf of the length of the chord connecting \(v_1\) to \(v_{k+1}\) in an \(n\)-gon is a constant multiple of

\ell^2 \phi_k(\ell) \phi_{n-k}(\ell)
2ϕk()ϕnk()\ell^2 \phi_k(\ell) \phi_{n-k}(\ell)

A more subtle test

Histogram of chord lengths from ~ 1 million 10-gons created by sampling arms and applying gmc, compared to exact pdf of random 10-gons

In the limit...

Conjecture

The probability measure generated by geometric median sampling converges (exponentially fast?) to the uniform distribution on equilateral \(n\)-gons as \(n \to \infty\).

Conjecture

The integral of any function which varies slowly enough with respect to any permutation-invariant probability measure on equilateral \(n\)-gons converges to the integral of the function with respect to the uniform distribution on equilateral \(n\)-gons as \(n \to \infty\).

Application: Polygon evolutions

Algorithm

To follow a flow in polygon space:

  1. Compute the infinitesimal variation \(V\) in \(\mathbb{R}^{dn}\) at \(X^{(n)}\) tangent to \(\text{Pol}_d(n)\) (or at least \(\text{Arm}_d(n)\)).
  2. Follow the exponential map in \(\text{Arm}_d(n)\) in direction \(V\) (to preserve edgelengths exactly).
  3. Use geometric median closure to get \(X^{(n+1)} \in \text{Pol}_d(n)\).

How far can you step?

A safe step distance

Definition

If \(P\) is a closed \(n\)-gon in \(\mathbb{R}^d\) with edge cloud \(X\) and \(\lambda_1\) is the first eigenvalue of \(X^TX\), define

d_{\text{safe}} := \frac{1}{4\sqrt{d}}(d-\lambda_1)
dsafe:=14d(dλ1)d_{\text{safe}} := \frac{1}{4\sqrt{d}}(d-\lambda_1)

Theorem (with Cantarella)

If \(P\) is a closed equilateral \(n\)-gon, every arm within \(d_{\text{safe}}\) of \(P\) has a geometric median closure. This distance bound is positive if \(P\) is not contained in a line.

The algorithm, redux

Algorithm

To follow a flow in polygon space:

  1. Compute the infinitesimal variation \(V\) in \(\mathbb{R}^{dn}\) at \(X^{(n)}\) tangent to \(\text{Pol}_d(n)\) (or at least \(\text{Arm}_d(n)\)).
  2. Find \(\lambda_1\) and use it to compute \(d_{\text{safe}}(X^{(n)})\).
  3. Follow the exponential map in \(\text{Arm}_d(n)\) in direction \(V\) (to preserve edgelengths exactly), but don't go further than \(d_{\text{safe}}\).
  4. Use geometric median closure to get \(X^{(n+1)} \in \text{Pol}_d(n)\).

Example: Energy-based carpenter’s rule

Recap

In any dimension

  • The geometric median provides an optimal loop closure.
  • It works on nearly all arms. The failure set is small, but somewhat hard to describe.
  • The pushforward measure is almost uniform.
  • These tools provide clean computational methods for polygon reconfigurations.

Thank you!

References

The symplectic geometry of closed equilateral random walks in 3-space

J. Cantarella & C. Shonkwiler

Annals of Applied Probability   26  (2016), no. 1, 549–596

A fast direct sampling algorithm for equilateral closed polygons

J. Cantarella, B. Duplantier, C. Shonkwiler, & E. Uehara

Journal of Physics A 49 (2016), no. 27, 275202

J. Phys. A Highlight of 2016

A natural map from random walks to closed polygons in any dimension

J. Cantarella & C. Shonkwiler

In preparation

Funding: Simons Foundation

From random walks to closed polygons

By Clayton Shonkwiler

From random walks to closed polygons

A natural map from random walks to equilateral polygons in any dimension

  • 1,900