[Katsukokoiso & SORA]
[Hoogeboom et al, 2022]
[Corso et al, 2023]
\(X\sim p_0\) over \(\mathbb R^d\)
\(X\sim p_0\) over \(\mathbb R^d\)
\(t \in [0,T]\)
degradation
\(t \in [0,T]\)
\(t \in [T,0]\)
Score function
[Song et al, 2019][Ho et al, 2020]degradation
generation/sampling
[Euler-Maruyama]Say \(X_t \sim \mathcal N(X_0,\sigma^2 I) \).
Then, \(\nabla \ln p_t(X_t) = \frac{1}{\sigma^2}\left(\mathbb E[X_0|X_t] - X_t\right)\)
Denoisers: \(f_\theta(X_t) \approx \underset{f}{\arg\min} ~ \mathbb E \left[ \|f(X_t) - X_0\|^2_2\right]\)
[Tweedie's]Motivation: Gradient Flow \(dX_t = -\nabla f(X) dt\)
\(X_{k+1} = X_k - \gamma \nabla f(X_k)\)
\(X_{k+1} = X_k - \gamma \nabla f(X_{k+1})\)
Forward discretization
Backward discretization
\(0=X_{k+1} - X_k + \gamma \nabla f(X_{k+1})\)
\( X_{k+1} = \underset{X}{\arg\min} \frac12 \|X-X_{k}\|^2_2 + \gamma f(X) \)
\( X_{k+1} = \text{prox}_{\gamma f}(X_k)\)
(GD)
(PPM)
\( \text{prox}_{\gamma f}(Y) \triangleq \underset{X}{\arg\min} \frac12 \|X-Y\|^2_2 + \gamma f(X) \)
Converges for \(\gamma < \frac2{L_f}\)
Converges for any \(\gamma>0\),
\(f\):non-smooth
Converges for \(\gamma < \frac2{L_f}\)
Converges for any \(\gamma>0\),
\(f\):non-smooth
Motivation: Gradient Flow \(dX_t = -\nabla f(X) dt\)
\(X_{k+1} = X_k - \gamma \nabla f(X_k)\)
\(X_{k+1} = X_k - \gamma \nabla f(X_{k+1})\)
Forward discretization
Backward discretization
\(0=X_{k+1} - X_k + \gamma \nabla f(X_{k+1})\)
\( X_{k+1} = \underset{X}{\arg\min} \frac12 \|X-X_{k}\|^2_2 + \gamma f(X) \)
\( X_{k+1} = \text{prox}_{\gamma f}(X_k)\)
(GD)
(PPM)
Backward discretization:
Forward discretization:
(DDPM)
[Ho et al, 2020]Score-based Sampling:
Proximal Diffusion Algorithm:
(ProxDM)
(DDPM)
Hybrid Diffusion Algorithm:
Score-based Sampling:
Proximal Diffusion Algorithm:
(DDPM)
Hybrid Diffusion Algorithm:
(ProxDM hybrid)
(ProxDM)
Score-based Sampling:
Proximal Diffusion Algorithm:
(DDPM)
Hybrid Diffusion Algorithm:
(ProxDM hybrid)
(ProxDM)
Theorem [Fang, Díaz, Buchanan, S.]
(informal)
ProxDM requires \(N\gtrsim {d/\sqrt{\epsilon}}\)
To acchieve \(\text{KL}(\text{target}||\text{sample})\leq \epsilon\)
ProxHybrid requires \(N\gtrsim {d/\epsilon}\)
DDPM requires \(N\) is \(\mathcal O( d/\epsilon)\) (vanilla) or \(\mathcal O(d^{3/4}/\sqrt{\epsilon})\) if accelerated
[Chen et al, 2022][Wu et al, 2024]
Sampling acceleration
Probability Flows and ODEs (e.g. DDIM) [Song et al 2020, Chen et al, 2023, ...]
DPM-solver [Lu et al 2022]
Higher-order solvers [Wu et al, 2024, ... ]
Accelerations of different kinds [Song et al, 2023, Chen et al, 2025, ... ]
Benefits of backward discretization of ODEs/SDEs
Optimization [Rockafellar, 1976], [Beck and Teboulle, 2015] ...
Langevin Dynamics: PLA [Bernton 2018, Pereyra 2016, Wibisono 2019, Durmus et al 2018]
Forward-backward in space of measures [Chen et al 2018, Wibisono, 2025]
Score-based Sampling:
Proximal Diffusion Algorithm:
(DDPM)
(ProxDM)
data-dependent
(MMSE) denoiser
Proximal Diffusion Algorithm:
(PDA)
\(\approx f_\theta\)
Theorem [Fang, Buchanan, S.]
Let \(f_\theta : \mathbb R^d\to\mathbb R^d\) be a network : \(f_\theta (x) = \nabla \psi_\theta (x)\),
where \(\psi_\theta : \mathbb R^d \to \mathbb R,\) convex and differentiable (ICNN).
Then,
1. Existence of regularizer
\(\exists ~R_\theta : \mathbb R^d \to \mathbb R\) not necessarily convex : \(f_\theta(x) \in \text{prox}_{R_\theta}(x),\)
2. Computability
We can compute \(R_{\theta}(x)\) by solving a convex problem
Proximal Matching Loss:Theorem [Fang, Buchanan, S.]
(PDA)
Other parametrization & implementation details...
Learned Proximal Networks
\(f_\theta\)
[Rombach et al, 2022]"A woman with long blonde hair and a black top stands against a neutral background. She wears a delicate necklace. The image is a portrait-style photograph with soft lighting."
(10 steps)
"A man with curly hair and a beard, wearing a dark jacket, stands indoors. The background is blurred, showing a blue sign and warm lighting. The image style is a realistic photograph."(10 steps)
Zhenghan Fang
Sam Buchanan
Mateo Díaz
Fang et al, Beyond Scores: Proximal Diffusion Models, Neurips 2025.
Fang et al, Learned Proximal Networks for Inverse Problems, ICLR 2024.
Fang et al, ProxT2I: Efficient Reward-Guided Text-to-Image Generation via Proximal Diffusion, arXiv 2025.
\( \text{prox}_{-\ln p}(Y) = \underset{X}{\arg\min} \frac12 \|X-Y\|^2_2 - \ln p(X) \)
\( = {\arg\max}~ p(X|Y) ~~~~ \text{(MAP)}\)
examples
Denoiser:
\(R(\tilde{x})\)