1: \begin{abstract}
2: We study the dynamics of gradient descent on objective
3: functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar
4: parameters $w_1,\ldots,w_k$), which arise in the context of
5: training depth-$k$ linear neural networks. We prove that for standard
6: random initializations, and under mild assumptions on $f$, the number of
7: iterations required for convergence scales exponentially with the depth
8: $k$. We also show empirically that this phenomenon can occur in higher
9: dimensions, where each $w_i$ is a matrix. This highlights a potential
10: obstacle in understanding the convergence of gradient-based methods for
11: deep linear neural networks, where $k$ is large.
12: \end{abstract}
13: