d111c6b5659dff9e.tex
1: \begin{abstract}
2: In this paper we explore a connection between deep networks and learning in reproducing kernel {\krein} space.  Our approach is based on the concept of push-forward - that is, taking a fixed non-linear transform on a linear projection and converting it to a linear projection on the output of a fixed non-linear transform, {\em pushing} the weights {\em forward} through the non-linearity.  Applying this repeatedly from the input to the output of a deep network, the weights can be progressively ``pushed'' to the output layer, resulting in a flat network that has the form of a fixed non-linear map (whose form is determined by the structure of the deep network) followed by a linear projection determined by the weight matrices - that is, we take a deep network and convert it to an equivalent (indefinite) kernel machine.  We then investigate the implications of this transformation for capacity control and uniform convergence, and provide a Rademacher complexity bound on the deep network in terms of Rademacher complexity in reproducing kernel {\krein} space.  Finally, we analyse the sparsity properties of the flat representation, showing that the flat weights are (effectively) $L_p$-``norm'' regularised with $p \in (0,1)$ (bridge regression).
3: \end{abstract}
4: