9dab81a22a4e4ba9.tex
1: \begin{abstract}
2:   \noindent 
3:   In this paper we develop a statistical theory and an implementation of deep
4:   learning (DL) models. We show that an  elegant variable splitting scheme for
5:   the alternating direction method of multipliers (ADMM) optimises a deep
6:   learning objective.  We allow for non-smooth non-convex regularisation
7:   penalties to induce sparsity in parameter weights.  We provide a link between
8:   traditional shallow layer statistical models such as principal component and
9:   sliced inverse regression and deep layer models. We also define the degrees
10:   of freedom of a deep learning predictor and a predictive MSE criteria to
11:   perform model selection for comparing architecture designs. We focus on deep
12:   multi-class logistic learning although our methods apply more generally.  Our
13:   results suggest an interesting and previously under-exploited relationship
14:   between deep learning and proximal splitting techniques.  To illustrate our
15:   methodology, we provide a multi-class logit classification analysis of Fisher's Iris data  where we illustrate
16:   the convergence of our algorithm.  Finally, we conclude
17:   with directions for future research.  
18:   
19:   \vspace{0.1in}
20:   \noindent Keywords: Deep Learning, Sparsity, Dropout, Convolutional Neural Nets; 
21:   Regularisation; Bayesian MAP; Image Segmentation; Classification; Multi-class Logistic regression.
22: \end{abstract}
23: