\( \text{Contents of this Presentation:}\)
\( \text{Theme of Research so far:}\)
Accepted first author paper in BOLD.
\( \textbf{Your Text Encoder Can Be An Object-Level Watermarking Controller}\)
Train Single Token Text Embeddings for Watermarking
\( \textbf{Your Text Encoder Can Be An Object-Level Watermarking Controller}\)
Latent Loss for Minimal Trajectory Adjustment to enable imperceptible watermarking
\( \textbf{Your Text Encoder Can Be An Object-Level Watermarking Controller}\)
Same Token Embedding can be used for both Full-image and Object-Level Watermarking
(utilizing Cross-Attention Maps)
\( \textbf{Your Text Encoder Can Be An Object-Level Watermarking Controller}\)
\( \textbf{Personalization through Trajectory Shifted Perturbations}\)
\( \textbf{Latent Diffusion Unlearning: Protecting against Unauthorized}\)
Shift Start of the Diffusion Denoising Trajectory for Latent-Level Unlearnable Sample.
\( \textbf{Personalization through Trajectory Shifted Perturbations}\)
\( \textbf{Latent Diffusion Unlearning: Protecting against Unauthorized}\)
Shortcut Diffusion Models for Unlearnable Perturbation Propagation
\( \textbf{Personalization through Trajectory Shifted Perturbations}\)
\( \textbf{Latent Diffusion Unlearning: Protecting against Unauthorized}\)
Artifact-Free
Unlearnable Samples
Visible Perturbation Overlay
\( \textbf{Personalization through Trajectory Shifted Perturbations}\)
\( \textbf{Latent Diffusion Unlearning: Protecting against Unauthorized}\)
Resistance to Strong DiffPure Attack
\( \textbf{Submitted First Author Papers - Under Review:}\)
\( \textbf{Controlling Hallucinations in Diffusion Models: A Case Study on Chess}\)
Mahesh Bhosale*, Naresh Kumar Devulapally*, Vishnu Suresh Lokhande, David Doermann
\( \textbf{Submitted to NeurIPS 2025:}\)
Variance Learning
and
Score Amplification
for
Hallucination Reduction
\( \textbf{Teaching Experience:}\)
\( \text{Course Instructor:} \textbf{ Computer Vision and Image Processing}\)
\( \text{Summer - 2025} \)
\( \text{Number of Students: } \textbf{71}\)
\( \text{Variational AutoEncoders} \)
\( \text{Diffusion Models}\)
\( \text{Generative Adversarial Networks}\)
\( \text{Reading Group Summer 2025} \)
\( \text{Discuss Recent GenAI papers}\)
\( \text{PhD Students at UB}\)
Generative Modeling Principles
unknown
• Generative modeling aims to learn a model \( p_\theta(x) \) that closely matches the real data distribution \( p_{\text{data}}(x) \) by minimizing their KL divergence or equivalently maximizing log-likelihood.
Forward Diffusion
Reverse Diffusion
Training via ELBO
Noise Matching Term
Training via ELBO
Noise Matching Term
Training via ELBO
Noise Matching Term
Parallel Decoding
Fixed-Time Sampling
Autoregressive Formulation in Traditional LLMs
Each token prediction depends on previous tokens.
Sequential sampling
No parallelism
Error compounding
Sequential sampling
Is the autoregressive paradigm the only viable path to achieving the intelligence exhibited by LLMs?
Research Question:
Large Language Diffusion Models
Pre-training
Supervised Finetuning
Sampling
Large Language Diffusion Models - Contributions
Large Language Diffusion Models - Loss
LLaDA’s core is a mask predictor: a model \( p_\theta(\cdot \mid x_t) \) that takes a masked sequence \( x_t \) and predicts all masked tokens (set \( M \)) simultaneously.
Pre-trained on 2.3 trillion tokens took 0.13 milion H800 GPU hours.
Large Language Diffusion Models - SFT Loss
\( L' \) - dynamic length response.
Large Language Diffusion Models - Inference
Initialization: The input sequence is constructed as [prompt tokens] + [MASK tokens] of target length.
Block-wise Iterative Denoising: The generation proceeds block-by-block. At each step, the model predicts logits for all masked positions in the current block.
Confidence-based Remasking Loop: This denoising loop is repeated for a fixed number of steps. Enables parallel yet progressive refinement of the full sequence.
Evaluation Metrics
Scalability
Performance compared to baselines
Reversal Reasoning
Limitations: Inference Time (No KV Caching)
Accelerating Diffusion Large Language Models with SlowFast Sampling
LLaDA Acc: 50.19%, LLaMA: 41.09% (GSM8k)
LLaDA Acc: 75.21%, LLaMA: 71.42% (GSM8k)
Problem Statement
Motivation
Accelerate Diffusion Training
Key Insight
Posterior Under Miscibility:
Weak learning signal.
Noise Prediction in Vanilla DDIM:
where:
Immiscible Diffusion
Use linear assignment to optimally match each image in the batch with a noise sample:
Conditional Distribution After Assignment
\( f(\cdot) \): Decaying function (e.g., Gaussian)
Each image diffuses to a local region in noise space
Posterior Becomes Informative
The posterior is no longer uniform, it's peaked around assigned images.
Noise Prediction in Immiscible Diffusion
Model now learns to predict noise that corresponds to a local cluster of images.
Enables strong gradients even at high noise levels.
Batch Assignment Algorithm
Inputs:
- \( x_b \): batch of images
- \( \epsilon_b \): batch of noise
- \( t_b \): batch of diffusion steps
Output:
Contributions
Motivation
Transformers: \( O(N^2) \) compute, autoregressive inference.
RNNs: constant-time inference but poor trainability
Goal: combine the best of both.
State Space Models (SSMs) + fast scan + physics-inspired dynamics = efficient LLMs
Motivation
Transformers: \( O(N^2) \) compute, autoregressive inference.
RNNs: constant-time inference but poor trainability
Goal: combine the best of both.
State Space Models (SSMs) + fast scan + physics-inspired dynamics = efficient LLMs
Sequence Modeling
Ideal Properties
\( O(1) \) inference.
\( O(N) \) training with parallelism
Linearly scalable with sequence length
Linear State-Space Models (SSMs)
Discretization via Euler / ZOH
Takeaway: Converts continuous dynamics to discrete steps
Recurrent Update
Efficient \( O(1) \) inference
Convolutional Reformulation
Training can be parallelized as convolution
Mamba
Oscillatory State Space Models (LinOSS)
Based on forced harmonic oscillators:
Introduce auxiliary state: \( z = y' \)
Takeaway: Introduces second-order dynamics to capture oscillations
Implicit Discretization of LinOSS
Introduce auxiliary state: \( z = y' \)
Takeaway: Introduces second-order dynamics to capture oscillations
LinOSS
LinOSS - Why do the results matter?
LinOSS is a enables long-sequence modeling at scale.
Future Research on Large Language Diffusion Models
Personalization:
References: