abstract:7d394e75f4c336e9.tex

1: \begin{abstract}

2: The proper initialization of weights is crucial for the effective

3: training and fast convergence of deep neural networks (DNNs).

4: Prior work in this area has mostly focused on

5: {\em balancing the variance

6: among weights per layer} to maintain stability of (i) the input data propagated forwards through the network and (ii) the loss gradients propagated backwards, respectively.

7: This prevalent heuristic is however agnostic of dependencies among gradients across the various layers and captures only first-order effects.

8: In this paper, we propose and discuss an initialization principle that is

9: based on a {\em rigorous estimation of the global curvature of weights across

10: layers} by approximating and controlling the norm of their Hessian

11: matrix. The proposed approach is more systematic and recovers previous

12: results for DNN activations such as {\em smooth functions}, {\em dropouts}, and {\em ReLU}.

13: Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks

14: confirm that tracking the Hessian norm is a useful diagnostic tool which helps to more rigorously initialize weights.

15: \end{abstract}

16: