d5e6bf7456c365a4.tex
1: \begin{abstract}
2: Understanding deep neural networks has been a major research objective in
3: recent years with notable theoretical progress.  A focal point of
4: those studies stems from the success of excessively large networks
5: which defy the classical wisdom of uniform convergence and learnability. We
6: study empirically the layer-wise functional structure of overparameterized deep
7: models. We provide evidence for the heterogeneous characteristic of layers. To
8: do so, we introduce the notion of robustness to post-training
9: \emph{re-initialization} and \emph{re-randomization}. We show that the layers can
10: be categorized as either ``\robust'' or ``\critical''.  Resetting the \robust
11: layers to their initial values has no negative consequence, and in many cases
12: they barely change throughout training.  On the contrary, resetting the
13: \critical layers completely destroys the predictor and the performance drops to
14: chanceh. Our study provides further evidence that mere parameter counting or
15: norm accounting is too coarse in studying generalization of deep models, and
16: flatness or robustness analysis of the models needs to respect the network
17: architectures.
18: \end{abstract}