1: \begin{abstract}
2: Diffusion models are becoming widely used in state-of-the-art image, video and audio generation. Score-based diffusion models stand out among these methods, necessitating the estimation of score function of the input data distribution. In this study, we present a theoretical framework to analyze two-layer neural network-based diffusion models by reframing score matching and denoising score matching as convex optimization. Though existing diffusion theory is mainly asymptotic, we characterize the exact predicted score function and establish the convergence result for neural network-based diffusion models with finite data. This work contributes to understanding what neural network-based diffusion model learns in non-asymptotic settings.
3: %Surprisingly, the predicted score function may not fit the data outside of its observed support.
4:
5: %the asymptotic convergence theory even with simple two-layer neural networks.
6:
7:
8:
9: % We characterize the predicted score function and establish
10:
11:
12: % We present the exact
13:
14: % We show that the global optimum of the score matching objective can be attained by solving a simple convex program. Specifically, for univariate training data, we establish that the Langevin diffusion process through the learned neural network model converges in the Kullback-Leibler (KL) divergence to either a Gaussian or a Gaussian-Laplace distribution when the weight decay parameter is set appropriately.
15:
16: % Our convex programs alleviate issues in computing the Jacobian and also extends to multidimensional score matching.
17:
18: \end{abstract}
19: