abstract:baabea6e72303346.tex

1: \begin{abstract} %   <- trailing '%' for backward compatibility of .sty file

2:

3: We study minimax optimization problems defined over infinite-dimensional function classes.

4: In particular, we restrict the functions to the class of overparameterized two-layer neural networks and study (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural network.

5: As an initial step, we consider the minimax optimization problem stemming from estimating a functional equation defined by conditional expectations via adversarial estimation, where the objective function is quadratic in the functional space.

6: For this problem, we establish convergence under the mean-field regime by considering the continuous-time and infinite-width limit of the optimization dynamics.

7: Under this regime, gradient descent-ascent corresponds to a Wasserstein gradient flow over the space of probability measures defined over the space of neural network parameters.

8: We prove that the Wasserstein gradient flow converges globally to a stationary point of the minimax objective at a $\cO(T^{-1} + \alpha^{-1} ) $ sublinear rate, and additionally finds the solution to the functional equation when the regularizer of the minimax objective is strongly convex.

9: Here $T$ denotes the time and $\alpha$ is a scaling parameter of the neural network.

10: In terms of representation learning, our results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $\cO(\alpha^{-1})$, measured in terms of the Wasserstein distance.

11: Finally, we apply our general results to concrete examples including policy evaluation, nonparametric instrumental variable regression, and asset pricing.

12:

13: \end{abstract}

14: