1: \begin{abstract} % <- trailing '%' for backward compatibility of .sty file
2:
3: We study minimax optimization problems defined over infinite-dimensional function classes.
4: In particular, we restrict the functions to the class of overparameterized two-layer neural networks and study (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural network.
5: As an initial step, we consider the minimax optimization problem stemming from estimating a functional equation defined by conditional expectations via adversarial estimation, where the objective function is quadratic in the functional space.
6: For this problem, we establish convergence under the mean-field regime by considering the continuous-time and infinite-width limit of the optimization dynamics.
7: Under this regime, gradient descent-ascent corresponds to a Wasserstein gradient flow over the space of probability measures defined over the space of neural network parameters.
8: We prove that the Wasserstein gradient flow converges globally to a stationary point of the minimax objective at a $\cO(T^{-1} + \alpha^{-1} ) $ sublinear rate, and additionally finds the solution to the functional equation when the regularizer of the minimax objective is strongly convex.
9: Here $T$ denotes the time and $\alpha$ is a scaling parameter of the neural network.
10: In terms of representation learning, our results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $\cO(\alpha^{-1})$, measured in terms of the Wasserstein distance.
11: Finally, we apply our general results to concrete examples including policy evaluation, nonparametric instrumental variable regression, and asset pricing.
12:
13: \end{abstract}
14: