abstract:751e65d05f1a3886.tex

1: \begin{abstract}

2: We present a maximum entropy inverse reinforcement learning (IRL) approach for improving the sample quality of diffusion generative models, especially when the number of generation time steps is small.

3: Similar to how IRL trains a policy based on the reward function learned from expert demonstrations, we train (or fine-tune) a diffusion model using the log probability density estimated from training data.

4: Since we employ an energy-based model (EBM) to represent the log density, our approach boils down to the joint training of a diffusion model and an EBM.

5: Our IRL formulation, named \textbf{Diffusion by Maximum Entropy IRL} (DxMI), is a minimax problem that reaches equilibrium when both models converge to the data distribution.

6: The entropy maximization plays a key role in DxMI, facilitating the exploration of the diffusion model and ensuring the convergence of the EBM.

7: We also propose \textbf{Diffusion by Dynamic Programming} (DxDP), a novel reinforcement learning algorithm for diffusion models, as a subroutine in DxMI.

8: DxDP makes the diffusion model update in DxMI efficient by transforming the original problem into an optimal control formulation where value functions replace back-propagation in time.

9: Our empirical studies show that diffusion models fine-tuned using DxMI can generate high-quality samples in as few as 4 and 10 steps.  Additionally, DxMI enables the training of an EBM without MCMC, stabilizing EBM training dynamics and enhancing anomaly detection performance.

10: \end{abstract}

11: