54e378edc883e243.tex
1: \begin{abstract}
2: We analyze architectural features of Deep Neural Networks (DNNs) using the so-called Neural Tangent Kernel (NTK), which describes the training and generalization of DNNs in the infinite-width setting.
3: In this setting, we show that for fully-connected DNNs, as the depth grows, two regimes appear: \textit{order}, where the (scaled) NTK converges to a constant, and \textit{chaos}, where it converges to a Kronecker delta. Extreme order slows down training while extreme chaos hinders generalization.
4: Using the scaled ReLU as a nonlinearity, we end up in the ordered regime.
5: In contrast, Layer Normalization brings the network into the chaotic regime. We observe a similar effect for Batch Normalization (BN) applied after the last nonlinearity.
6: We uncover the same order and chaos modes in Deep Deconvolutional Networks (DC-NNs). Our analysis explains the appearance of so-called checkerboard patterns and border artifacts. Moving the network into the chaotic regime prevents checkerboard patterns; we propose a graph-based parametrization which eliminates border artifacts; finally, we introduce a new layer-dependent learning rate to improve the convergence of DC-NNs.
7: We illustrate our findings on DCGANs: the ordered regime leads to a collapse of the generator to a checkerboard mode, which can be avoided by tuning the nonlinearity to reach the chaotic regime. As a result, we are able to obtain good quality samples for DCGANs without BN.
8: \end{abstract}