abstract:5e11c1935a56f0f3.tex

1: \begin{abstract}

2: Neuron death is a complex phenomenon with implications for model trainability:

3: the deeper the network, the lower the probability of finding a valid

4: initialization. In this work, we derive both upper and lower bounds

5: on the probability that a ReLU network is initialized to a trainable

6: point, as a function of model hyperparameters. We show that it is

7: possible to increase the depth of a network indefinitely, so long

8: as the width increases as well. Furthermore, our bounds are asymptotically

9: tight under reasonable assumptions: first, the upper bound coincides

10: with the true probability for a single-layer network with the largest

11: possible input set. Second, the true probability converges to our

12: lower bound as the input set shrinks to a single point, or as the

13: network complexity grows under an assumption about the output variance.

14: We confirm these results by numerical simulation, showing rapid convergence

15: to the lower bound with increasing network depth. Then, motivated

16: by the theory, we propose a practical sign flipping scheme which guarantees

17: that the ratio of living data points in a $k$-layer network is at

18: least $2^{-k}$. Finally, we show how these issues are mitigated by

19: network design features currently seen in practice, such as batch

20: normalization, residual connections, dense networks and skip connections.

21: This suggests that neuron death may provide insight into the efficacy

22: of various model architectures.

23: \end{abstract}