970a0657873b9687.tex
1: \begin{abstract}
2: 		In the current paper we provide constructive estimation of the
3: 		convergence rate for training a known class of neural networks: the
4: 		multi-class logistic regression.  Despite several decades of
5: 		successful use, our rigorous results appear new, reflective of the
6: 		gap between practice and theory of machine learning. Training a
7: 		neural network is typically done via variations of the gradient descent
8: 		method. If a minimum of the loss function exists and gradient
9: 		descent is used as the training method, we provide an expression
10: 		that relates learning rate to the rate of convergence to the
11: 		minimum. The method involves an estimate of the condition number of
12: 		the Hessian of the loss function. We also discuss the existence of a
13: 		minimum, as it is not automatic that a minimum exists. One method of 
14: 		ensuring convergence is by assigning positive probabiity to every class 
15: 		in the training
16: 		dataset.
17: 	\end{abstract}
18: