d4f18e8f7b45d5c1.tex
1: \begin{abstract}
2: It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parametrizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest ---non-linear but regular networks--- no tight characterization has yet been achieved, despite significant developments. 
3: 
4: We take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly understood how neural networks routinely tackle high-dimensional datasets and adapt to latent low-dimensional structure without suffering from the curse of dimensionality.
5: Accordingly, we study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension $d$. 
6: 
7: Our main results characterize a hierarchical property ---the merged-staircase property--- that is both {\it necessary and nearly sufficient} for learning in this setting. 
8:  We further show that non-linear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new ``dimension-free'' dynamics approximation result that applies to functions defined on a latent space of low-dimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for non-almost orthogonal functions.
9: 
10: \end{abstract}
11: