080a04bf9ce06623.tex
1: \begin{abstract}
2: Understanding capabilities and limitations of different network architectures
3: is of fundamental importance to machine learning. Bayesian inference
4: on Gaussian processes has proven to be a viable approach for studying
5: recurrent and deep networks in the limit of infinite layer width,
6: $n\to\infty$. Here we present a unified and systematic derivation
7: of the mean-field theory for both architectures that starts from first
8: principles by employing established methods from statistical physics
9: of disordered systems. The theory elucidates that while the mean-field
10: equations are different with regard to their temporal structure, they
11: yet yield identical Gaussian kernels when readouts are taken at a
12: single time point or layer, respectively. Bayesian inference applied
13: to classification then predicts identical performance and capabilities
14: for the two architectures. Numerically, we find that convergence towards
15: the mean-field theory is typically slower for recurrent networks than
16: for deep networks and the convergence speed depends non-trivially
17: on the parameters of the weight prior as well as the depth or number
18: of time steps, respectively. Our method exposes that Gaussian processes
19: are but the lowest order of a systematic expansion in $1/n$. The
20: formalism thus paves the way to investigate the fundamental differences
21: between recurrent and deep architectures at finite widths $n$.
22: \end{abstract}
23: