bbe8fa032ff76d83.tex
1: \begin{abstract}
2: This paper presents a new optimisation approach to train Deep Neural Networks (DNNs) with discriminative sequence criteria. At each iteration, the method combines information from the Natural Gradient (NG) direction with local curvature information of the error surface that enables better paths on the parameter manifold to be traversed. The method has been applied within a Hessian Free (HF) style optimisation framework to sequence train both standard fully-connected DNNs and Time Delay Neural Networks as speech recognition acoustic models. The efficacy of the method is shown using experiments on a Multi-Genre Broadcast (MGB) transcription task and neural networks using sigmoid and ReLU activation functions have been investigated. It is shown that  for the same number of updates this proposed approach achieves larger reductions in the word error rate (WER) than both NG and HF, and also leads to 
3: %better convergence than  standard stochastic gradient descent.
4: a lower WER than  standard stochastic gradient descent.
5: 
6: 
7: % It is shown that this optimisation approach leads to larger reductions in the word error rate  than NG, HF or standard stochastic gradient descent.
8: 
9: % It is shown that this proposed approach achieves larger reductions in the word error rate and lead to better convergence  than NG, HF or standard stochastic gradient descent.  
10: %the proposed approach achieves achieves larger reductions in the word error rate than NG or HF and leads to better overall convergence than SGD.
11: 
12: % this proposed approach achieves larger reductions in the word error rate  for the same number of updates  than NG and HF and achieves better convergence than  standard stochastic gradient descent.
13: 	
14: 
15: \end{abstract}
16: