dc9e33de07807acc.tex
1: \begin{abstract}
2: We study a class of deep neural networks with networks that form a directed acyclic graph (DAG).   For backpropagation defined by gradient descent with adaptive momentum, we show weights converge for a large class of nonlinear activation functions.  The proof generalizes the results of Wu et al. (2008) who showed convergence for a feed forward network with one hidden layer. For an example of the effectiveness of DAG architectures, we describe an example of compression through an autoencoder, and compare against sequential feed-forward networks under several metrics.
3: \end{abstract}
4: