751e65d05f1a3886.tex
1: \begin{abstract}
2: We present a maximum entropy inverse reinforcement learning (IRL) approach for improving the sample quality of diffusion generative models, especially when the number of generation time steps is small.
3: Similar to how IRL trains a policy based on the reward function learned from expert demonstrations, we train (or fine-tune) a diffusion model using the log probability density estimated from training data. 
4: Since we employ an energy-based model (EBM) to represent the log density, our approach boils down to the joint training of a diffusion model and an EBM.
5: Our IRL formulation, named \textbf{Diffusion by Maximum Entropy IRL} (DxMI), is a minimax problem that reaches equilibrium when both models converge to the data distribution. 
6: The entropy maximization plays a key role in DxMI, facilitating the exploration of the diffusion model and ensuring the convergence of the EBM.
7: We also propose \textbf{Diffusion by Dynamic Programming} (DxDP), a novel reinforcement learning algorithm for diffusion models, as a subroutine in DxMI.
8: DxDP makes the diffusion model update in DxMI efficient by transforming the original problem into an optimal control formulation where value functions replace back-propagation in time.
9: Our empirical studies show that diffusion models fine-tuned using DxMI can generate high-quality samples in as few as 4 and 10 steps.  Additionally, DxMI enables the training of an EBM without MCMC, stabilizing EBM training dynamics and enhancing anomaly detection performance.
10: \end{abstract}
11: