1: \begin{abstract}
2: We analyze the joint probability distribution on the lengths of the
3: vectors of hidden variables in different layers of a fully connected
4: deep network, when the weights and biases are chosen randomly according to
5: Gaussian distributions, and the input is in $\{ -1, 1\}^N$. We show
6: that, if the activation function $\phi$ satisfies a minimal set of
7: assumptions, satisfied by all activation functions that we know that
8: are used in practice, then, as the width of the network gets large,
9: the ``length process'' converges in probability to a length map
10: that is determined as a simple function of the variances of the
11: random weights and biases, and the activation function $\phi$.
12:
13: We also show that this convergence may fail for $\phi$ that violate our assumptions.
14: \end{abstract}
15: