c2c9b33c6c087caf.tex
1: \begin{abstract}
2: Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. 
3: We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a  benign optimization landscape? In this work,  we provide partially affirmative answers to both questions for 1-hidden-layer networks with fewer than $n$ (sample size) neurons when the activation is smooth.
4:   First, we prove that as long as the width $m \geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss.
5: Second, 
6: we identify a nice local region with no local-min or
7: saddle points.
8:  Nevertheless, it is not clear whether gradient
9:  descent can stay in this nice region.
10:  Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer. 
11:  It is expected that projected gradient methods
12:  converge to KKT points under mild technical conditions,
13:  but we leave the rigorous convergence analysis to future work.
14:  Thorough numerical results show that projected gradient methods
15:  on this constrained formulation significantly
16:  outperform SGD for training narrow neural nets. 
17:  
18: \end{abstract}
19: