1: \begin{abstract}
2: \noindent
3: In this paper we develop a statistical theory and an implementation of deep
4: learning (DL) models. We show that an elegant variable splitting scheme for
5: the alternating direction method of multipliers (ADMM) optimises a deep
6: learning objective. We allow for non-smooth non-convex regularisation
7: penalties to induce sparsity in parameter weights. We provide a link between
8: traditional shallow layer statistical models such as principal component and
9: sliced inverse regression and deep layer models. We also define the degrees
10: of freedom of a deep learning predictor and a predictive MSE criteria to
11: perform model selection for comparing architecture designs. We focus on deep
12: multi-class logistic learning although our methods apply more generally. Our
13: results suggest an interesting and previously under-exploited relationship
14: between deep learning and proximal splitting techniques. To illustrate our
15: methodology, we provide a multi-class logit classification analysis of Fisher's Iris data where we illustrate
16: the convergence of our algorithm. Finally, we conclude
17: with directions for future research.
18:
19: \vspace{0.1in}
20: \noindent Keywords: Deep Learning, Sparsity, Dropout, Convolutional Neural Nets;
21: Regularisation; Bayesian MAP; Image Segmentation; Classification; Multi-class Logistic regression.
22: \end{abstract}
23: