Writing up my calculations as a paper draft.
Benefits of iterative refinement for multi-scale distributions
Coarsening resolution reduces mode separation and conditioning reduces log-sobolev constants in general.
1. Directional Safety Constraints in Optimal Control
2. Mixture of experts for learning switching dynamical systems
"observations generated from system \(S_k\)
get routed to the same subset of experts"
3. Meeting Mrigank to go through his writeup tomorrow.
An Epsilon-net based excess risk bounds for mixture-of-experts aggregation.
Depends on number of parameters of predictor
Here are some things I want to try for improving this result,
1. Continuous online optimization instead of discretization?
2. I want to read through the proofs of a path-lenth (cumulative variation) bound carefully.
3. Found a relevant article on regret bounds for "predictable sequences".
Maybe I can borrow some ideas ?
Online Linear Optimization Setting
Mirror Descent, FTRL etc. guarantee \( \mathrm{Reg}_T = O(\sqrt{T}) \) regardless of the sequence.
Tight for adversarial sequences but pessimistic when the sequence has structure
Assume that the sequence is benign: well described by a predictable process
\[ M_t(x_1, \ldots, x_{t-1}) \]such that
\[ x_t = M_t + \delta_t \] where \( M_t \) is a fixed guess and \( \delta_t \) is small noise.
The learner uses \( M_t \) as an internal hint to compute a better action at each round.
Examples:
1. \( M_t = x_{t-1} \) (path-length),
2. \( M_t = \frac{1}{t-1}\sum_{s<t} x_s \) (empirical mean), or any auto-regressive model.
3. \( M_t = \frac{1}{t-1}\sum_{s<t} \alpha_s x_s \), where \(\sum_{s}\alpha_s = 1\). "fading memory statistics"
Can we incorporate prior information on the evolution of sequence into regret analysis?
"Optimistic Mirror Descent" algorithm incorporates \( M_{t+1} \) into the update as if it were the true next move:
The main regret bound (Lemma 2) is:
\[ \mathrm{Reg}_T \leq \frac{(\text{Diam}(\mathcal{F}))^2}{\eta} + \frac{\eta}{2} \sum_{t=1}^T \|x_t - M_t\|_*^2 \quad \xrightarrow{\text{optimize } \eta} \quad \mathrm{Reg}_T \leq c\,\sqrt{\sum_{t=1}^T \|x_t - M_t\|_*^2} \]
Pretend that \(M_{t+1}\) is the correct generator for \(x_{t+1}\)
Authors also discuss "learning" \(M_t \in \mathcal{M}\), but that incurs a log-cost in the regret bound \( \log |\mathcal{M}| \)
1. Sent slides on Learned DA
2. Readings on cutting edge methods in Video-SR.
3.
\(α = 0.95,\; β = 0.7,\; λ = 0.6,\; ω(0) = 3.5,\; ρ =0.25, ε = 0.1\)
Varying \(\omega\)
sinusoidally
Still working on theory
Each observation has imperfect information on the phase because of noise.
Tracking phase through history via a filtering algorithm can help, but windowing throws away information
Tracking amplitude and phase enables form full history can be better
Window algorithm : estimate phase, amplitude from window (length \(w\))
EKF : track phase, amplitude as "states" and predict \(y_{t+1}\)
12/18
Given \( \{x_t^{ref}, y_t\}_{t=1}^T\), learn:
A forward model F that captures the dynamics of state evolution.
An analysis model G that assimilates the forecasted state and the new observation to produce an updated estimate (the analysis state). Formally, a conditional denoising diffusion model.
generated distribution with conditioning
At inference time, given \(x_0\) and \( \{y_1, y_2, \ldots \} \)
Experiment with different diffusion models and different ground truth dynamics
Supervised learning objective
Generative model explicitly incorporates dynamics through conditioning on forecast state.
prediction from reference
I propose that we enforce more regularity during training
analysis for time t from \(x_t^{F,ref}\)
Enforce additional forward constraints
Implicitly asks F, G to be aligned
Regularity for analysis model
Backward alignment.
Conditioned on \(y_t\) if the predicted state resembles state at next time step, then generative model should still sample near \(x_t^{ref}\)
Learning objectives
Learn forward from data
Learn to analyze from data
forward and analysis should align
Analysis should be
backwards compatible
These constraints can be approximately specified for the denoiser network.
I proposed an algorithm where forward dynamics and analysis step are jointly learned from (state,observation) trajectories.
Two main questions from last meeting:
Different diffusion-based DA systems learn different Bayesian posterior distributions. Hodyss et. al, 2025 claim that diffusion models that account for forward dynamics are better based on simplified experimental settings (linear, Gaussian)
1. Climatology-trained diffusion DA
2. Cycle-trained diffusion DA
3. Hybrid diffusion DA
Method : Represent the filtering density \(p(x_t|y_t)\) via a score-based diffusion model instead of a finite particle set.
Each DA step: propagate previous states through the known dynamics, retrain the score network on these samples to approximate the prior filtering density \(p(x_t|y_1, \ldots, y_{t-1})\) .
Update the score function analytically from prior filtering density to posterior filtering density
Sampling: draw arbitrarily many samples from the filtering density with the reverse-time diffusion using the learned score.
Key : Dynamics are not learned. Score function is continuously updated and retrained through DA.
Journal of Computational Physics, Oct 2024
Training data: State–observation trajectories \(\{x_k, y_k\}_{k=1}^K\)
Learning Task: A conditional diffusion model of the smoothing posterior \(p(x_{1:K} | y_{1:K})\) p(x1:K∣y1:K)p(x_{1:K} \mid y_{1:K}).
Contribution: Non-linear observation operators are handled by doing DA for the pair of augmented state \(z=(x,y)\) and observations \(y\).
State-Observation Augmented Diffusion (SOAD) model for nonlinear assimilation with unknown dynamics
Journal of Computational Physics , Oct 2025
- Reading more literature on analysis operators. Aiming to have a share-able document on Monday, Dec 1.
Both articles learn analysis operator in somewhat similar manner
- Applying to more TT' jobs now
- Assembling my application materials
- Still polishing my research statement
state evolution
observations
1. When \(\epsilon_t, \eta_t\) is Gaussian (and independent of state iterates), Kalman filter is optimal.
2. Analysis filter in KF is linear in the predicted state \(u_t^f\) and observation \(y_t\)
Goal : improve analysis step
Suppose we have a batch of paired state, observations
$$(u_t^{ref}, y_t)$$
where \(u_t^{ref}\) is the reference state at time t, for eg,
1. accurate but computationally intensive high-fidelity numerical solvers,
2. high-quality reanalysis products (e.g., ERA5 for global atmospheric circulation),
Prediction step
Analysis step 1
Analysis step 2
deterministic, coarse
probabilistic, finer
where, \(r_t \sim p_{\phi}(y_t, \tilde{u}_t)\)