abstract:080a04bf9ce06623.tex

1: \begin{abstract}

2: Understanding capabilities and limitations of different network architectures

3: is of fundamental importance to machine learning. Bayesian inference

4: on Gaussian processes has proven to be a viable approach for studying

5: recurrent and deep networks in the limit of infinite layer width,

6: $n\to\infty$. Here we present a unified and systematic derivation

7: of the mean-field theory for both architectures that starts from first

8: principles by employing established methods from statistical physics

9: of disordered systems. The theory elucidates that while the mean-field

10: equations are different with regard to their temporal structure, they

11: yet yield identical Gaussian kernels when readouts are taken at a

12: single time point or layer, respectively. Bayesian inference applied

13: to classification then predicts identical performance and capabilities

14: for the two architectures. Numerically, we find that convergence towards

15: the mean-field theory is typically slower for recurrent networks than

16: for deep networks and the convergence speed depends non-trivially

17: on the parameters of the weight prior as well as the depth or number

18: of time steps, respectively. Our method exposes that Gaussian processes

19: are but the lowest order of a systematic expansion in $1/n$. The

20: formalism thus paves the way to investigate the fundamental differences

21: between recurrent and deep architectures at finite widths $n$.

22: \end{abstract}

23: