1: \begin{abstract}
2: Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task.
3: We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a benign optimization landscape? In this work, we provide partially affirmative answers to both questions for 1-hidden-layer networks with fewer than $n$ (sample size) neurons when the activation is smooth.
4: First, we prove that as long as the width $m \geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss.
5: Second,
6: we identify a nice local region with no local-min or
7: saddle points.
8: Nevertheless, it is not clear whether gradient
9: descent can stay in this nice region.
10: Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.
11: It is expected that projected gradient methods
12: converge to KKT points under mild technical conditions,
13: but we leave the rigorous convergence analysis to future work.
14: Thorough numerical results show that projected gradient methods
15: on this constrained formulation significantly
16: outperform SGD for training narrow neural nets.
17:
18: \end{abstract}
19: