abstract:c650d98f047d8a32.tex

1: \begin{abstract}

2: \glspl{SGM} have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in \glspl{SGM}, ensuring convergence to the desired target distribution. Our approach reveals that several existing \glspl{SGM} can be seen as specific manifestations of our framework. Building upon this method, we introduce \gls{\nsmn}, which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of \gls{\nsmn} compared to various competing approaches on established image synthesis benchmarks. Remarkably, \gls{\nsmn} achieves sample quality akin to state-of-the-art \glspl{SGM} (FID: \textbf{2.10} for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of \gls{\nsmn} in conditional synthesis using pre-trained score networks, offering an appealing alternative as an \gls{SGM} backbone for future advancements. Code and model checkpoints can be accessed at \url{https://github.com/mandt-lab/PSLD}.

3: \end{abstract}

4: