abstract:c2c9b33c6c087caf.tex

1: \begin{abstract}

2: Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task.

3: We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a  benign optimization landscape? In this work,  we provide partially affirmative answers to both questions for 1-hidden-layer networks with fewer than $n$ (sample size) neurons when the activation is smooth.

4:   First, we prove that as long as the width $m \geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss.

5: Second,

6: we identify a nice local region with no local-min or

7: saddle points.

8:  Nevertheless, it is not clear whether gradient

9:  descent can stay in this nice region.

10:  Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.

11:  It is expected that projected gradient methods

12:  converge to KKT points under mild technical conditions,

13:  but we leave the rigorous convergence analysis to future work.

14:  Thorough numerical results show that projected gradient methods

15:  on this constrained formulation significantly

16:  outperform SGD for training narrow neural nets.

17:

18: \end{abstract}

19: