5e11c1935a56f0f3.tex
1: \begin{abstract}
2: Neuron death is a complex phenomenon with implications for model trainability:
3: the deeper the network, the lower the probability of finding a valid
4: initialization. In this work, we derive both upper and lower bounds
5: on the probability that a ReLU network is initialized to a trainable
6: point, as a function of model hyperparameters. We show that it is
7: possible to increase the depth of a network indefinitely, so long
8: as the width increases as well. Furthermore, our bounds are asymptotically
9: tight under reasonable assumptions: first, the upper bound coincides
10: with the true probability for a single-layer network with the largest
11: possible input set. Second, the true probability converges to our
12: lower bound as the input set shrinks to a single point, or as the
13: network complexity grows under an assumption about the output variance.
14: We confirm these results by numerical simulation, showing rapid convergence
15: to the lower bound with increasing network depth. Then, motivated
16: by the theory, we propose a practical sign flipping scheme which guarantees
17: that the ratio of living data points in a $k$-layer network is at
18: least $2^{-k}$. Finally, we show how these issues are mitigated by
19: network design features currently seen in practice, such as batch
20: normalization, residual connections, dense networks and skip connections.
21: This suggests that neuron death may provide insight into the efficacy
22: of various model architectures.
23: \end{abstract}