1: \begin{abstract}
2: We propose the {\it particle dual averaging} (PDA) method, which generalizes the dual averaging method in convex optimization to the optimization over probability distributions with quantitative runtime guarantee. The algorithm consists of an inner loop and outer loop: the inner loop utilizes the Langevin algorithm to approximately solve for a stationary distribution, which is then optimized in the outer loop. The method can thus be interpreted as an extension of the Langevin algorithm to naturally handle \textit{nonlinear} functional on the probability space. An important application of the proposed method is the optimization of neural network in the \textit{mean field} regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain. By adapting finite-dimensional convex optimization theory into the space of distributions, we analyze PDA in regularized empirical / expected risk minimization, and establish \textit{quantitative} global convergence in learning two-layer mean field neural networks under more general settings. Our theoretical results are supported by numerical simulations on neural networks with reasonable size.
3: \end{abstract}
4: