abstract:cd525499a7d6c5d5.tex

1: \begin{abstract}

2:

3: The regularization and output consistency behavior of dropout and layer-wise pretraining for learning deep networks have been fairly well studied.

4: However, our understanding of how the asymptotic convergence of backpropagation in deep architectures is related to the structural properties of the network

5: and other design choices (like denoising and dropout rate) is less clear at this time.

6: An interesting question one may ask is whether the network architecture and input data statistics may guide the choices of learning parameters and vice versa.

7: In this work, we explore the association between such structural, distributional and learnability aspects vis-\`a-vis their interaction with parameter convergence rates.

8: We present a framework to address these questions based on convergence of backpropagation for general nonconvex objectives using first-order information.

9: This analysis suggests an interesting relationship between feature denoising and dropout.

10: Building upon these results, we obtain a setup that provides systematic guidance regarding the choice of learning parameters and network sizes that

11: achieve a certain level of convergence (in the optimization sense) often mediated by statistical attributes of the inputs.

12: Our results are supported by a set of experimental evaluations as well as independent empirical observations reported by other groups.

13:

14: \end{abstract}

15: