Estimation of Wasserstein distances in the Spiked Transport Model
Study Group on Optimal Transport
Daniel Yukimura

Estimation of Wasserstein distances in the Spiked Transport Model
Contibutions:
- Estimation of Wasserstein distance in HD for distributions that differ only in a LD subspace.
- Lower bounds.
- A computational-statistical gap.
Introduction:
Wasserstein distance
Prop. 1: Let μ be a prob. on [−1,1]d. If μn is the assoc. empirical measure, then for any p∈[1,∞]
Spiked Transport Model:
- Fix U⊆Rd of dim. k≪d.
- R.v. X(1),X(2)∈U with arbitrary distributions.
- R.v. Z⊥(X(1),X(2)) supported on U⊥.
low dim.
Spiked Transport Model:
Question: Given n i.i.d. observations from both μ(1) and μ(2) is it possible to estimate Wp(μ(1),μ(2)) at a rate faster than n−1/d?
- Answer is yes, but we need smoothness and decay assumptions on the measures.
Concentration Assumptions:
A prob. meas. μ on Rd satisfy the Tp(σ2) transport inequality if
Wasserstein Projection Pursuit:
- Consider the set Vk(Rd) of k×d matrices with orthonormal rows.
- For μ∈Prob(Rd) and U∈Vk(Rd), define μU as the law of U.Y for Y∼μ.
Def.: For k∈[d], the k-dimensional Wasserstein distance between μ(1) and μ(2) is
Wasserstein Projection Pursuit:
WPP estimator
Estimation:
Theorem 1: Let (μ(1),μ(2)) sats. the Spiked Transport Model (STM). For any p∈[1,2], if μ(1) and μ(2) sats. the Tp(σ2) ineq., then
Before proving:
Prop. 3: Under the STM
Thm. 6: Let p∈[1,2]. A meas. μ∈Pp(Rd) (prob. with finite pth moment) satisfies Tp(σ2) if and only if the r.v. Wp(μn,μ) is σ2/n-subgaussian for all n.
Prop. 5: Let U∈Vk(Rd). For any p∈[1,2] and σ>0, if μ satisfies Tp(σ2), then so does μU.
Proof for estimation:
First notice that
EWp,k−Wp(μ(1),μ(2))≤EW~p,k(μ(1),μn(1))+EW~p,k(μ(2),μn(2))
Then we can focus on bounding EW~p,k(μ,μn).
- We assume w.l.g. that μ has mean 0, and σ=1.
- Consider ZU:=Wp(μU,(μn)U)
Lemma: ∃ r.v. L s.t. for all U,V∈Vk(Rd)
∣ZU−ZV∣≤L∥U−V∥op
and EL≲dp
Proof for estimation:
Lemma: ∃ r.v. L s.t. for all U,V∈Vk(Rd)
∣ZU−ZV∣≤L∥U−V∥op
and EL≲dp
proof: Let X∼μ, then
Proof for estimation:
- The process ZU is Lipschitz,
- By Thm. 6 ZU is n−1-subgaussian.
Now, using a standard ε-net argument on our estimation over Vk(Rd) we get
Where N(Vk,ε,∥⋅∥op) is the covering number of Vk with resp. to the op. norm.
Proof for estimation:
There exist a univ. const. c such that N(Vk,ε,∥⋅∥op)≤dklogεck for ε∈(0,1]. Choosing ε=k/n yelds
Proof for estimation:
Finally we get,
Spike estimation:
Theorem 10: Let p∈[1,2]. Assume (μ(1),μ(2)) sats. STM and the Tp(σ2) ineq. Let \(\hat \mathcal{U} := \text{span}(\hat U)\), where
Then
Lower Bounds:
Consider a compact metric space X s.t.
cε−d≤N(X,ε)≤Cε−d
for all ε≤diam(X).
With P=Prob(X) define
R(n,P):=W^infμ,ν∈PsupEμ,ν∣W^−Wp(μ,ν)∣.
Lower Bounds:
Theorem 11: Let d>2p>2 and X with the cov. number as before. Then
R(n,P)≥Cd,p(dlogn)−1/d
E∣W^−Wp(μ(1),μ(2))∣≳σnd
Theorem 4:
Lower Bounds:
Prop. 9: Assume d>2p>2 and let m be a pos. integer. Let u=Unif([m]). ∃ a random function F:[m]→X s.t. for any dist. q on [m],
with prob. at least .9
Lower Bounds:
proof: For the lower bound:
Use the cov. num. to get a set of points Gm={xi}i∈[m] s.t. d(xi,xj)≳m−1/d. Then select F unif. at random from the set of bijections from [m] to Gm. Now, since
d(x,y)p≳m−p/d1{x=y}
we can com get for any coupling π of F#q and F#u
∫d(x,y)pdπ(x,y)≳m−p/dP[X=Y]≥m−p/ddTV(q,u)
Lower Bounds:
For the upper bound: ...
Lower Bounds:
Prop. 10: Fix n∈N and a cnt. δ∈[0,.1]. Given m∈N, let Dm be the set of prob. q in [m] sats. χ2(q,u)≤9. Denote by Dm,δ− the subset of Dm where dTV(q,u)≤δ and by Dm+ the subset sats. dTV(q,u)≥1/4. If m=⌈Cδ−1nlogn⌉ for a sufficiently large univ. C and n is sufficiently large, then
ψinf{q∈Dm+supPq[ψ=1]+q∈Dm,δ−supPq[ψ=0]}≥.9
where the inf is taken over all test based on n samples.
Lower Bounds:
Proof of Thm. 11:
Let A={∣W^−Wp(F#q,F#u)∣≥Δd}, with Δd=161c∗m−1/d. Then
We def. the rand. test
ψ(X1,…,Xn):=1{W^(F(X1),…,F(Xn); F(Y1),…,F(Yn))≤2Δd}
Lower Bounds:
Choosing m=⌈Cδ−1nlogn⌉ for suff. large C and apply prop. 10 gives μ,ν∈PsupPμ,ν[∣W^−Wp(μ,ν)∣≥Δd]≥.8, and Markov's ineq. yields the claim.
Computational-Statistical Gap:
Def. 5: Given a dist. D on Rd, for any sample size t>0 and f:Rd→[0,1], the oracle VSTAT(t) returns a value v∈[p−τ,p+τ], where p=Ef(X) and τ=t1∨tp(1−p)
Computational-Statistical Gap:
Theorem 12: There exists a pos. univ. constant c s.t., for any d, estimating W1(μ(1),μ(2)) for μ(1), μ(2) sats. the STM with k=1 to accuracy Θ(1/d) with prob. at least 2/3 requires at least 2cd queries to VSTAT(2cd).
Copy of OT via Factored Couplings
By Daniel Yukimura
Copy of OT via Factored Couplings
- 284