abstract:d5e6bf7456c365a4.tex

1: \begin{abstract}

2: Understanding deep neural networks has been a major research objective in

3: recent years with notable theoretical progress.  A focal point of

4: those studies stems from the success of excessively large networks

5: which defy the classical wisdom of uniform convergence and learnability. We

6: study empirically the layer-wise functional structure of overparameterized deep

7: models. We provide evidence for the heterogeneous characteristic of layers. To

8: do so, we introduce the notion of robustness to post-training

9: \emph{re-initialization} and \emph{re-randomization}. We show that the layers can

10: be categorized as either ``\robust'' or ``\critical''.  Resetting the \robust

11: layers to their initial values has no negative consequence, and in many cases

12: they barely change throughout training.  On the contrary, resetting the

13: \critical layers completely destroys the predictor and the performance drops to

14: chanceh. Our study provides further evidence that mere parameter counting or

15: norm accounting is too coarse in studying generalization of deep models, and

16: flatness or robustness analysis of the models needs to respect the network

17: architectures.

18: \end{abstract}