1: \begin{abstract}
2: The proper initialization of weights is crucial for the effective
3: training and fast convergence of deep neural networks (DNNs).
4: Prior work in this area has mostly focused on
5: {\em balancing the variance
6: among weights per layer} to maintain stability of (i) the input data propagated forwards through the network and (ii) the loss gradients propagated backwards, respectively.
7: This prevalent heuristic is however agnostic of dependencies among gradients across the various layers and captures only first-order effects.
8: In this paper, we propose and discuss an initialization principle that is
9: based on a {\em rigorous estimation of the global curvature of weights across
10: layers} by approximating and controlling the norm of their Hessian
11: matrix. The proposed approach is more systematic and recovers previous
12: results for DNN activations such as {\em smooth functions}, {\em dropouts}, and {\em ReLU}.
13: Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks
14: confirm that tracking the Hessian norm is a useful diagnostic tool which helps to more rigorously initialize weights.
15: \end{abstract}
16: