1: \begin{abstract}
2: In the current paper we provide constructive estimation of the
3: convergence rate for training a known class of neural networks: the
4: multi-class logistic regression. Despite several decades of
5: successful use, our rigorous results appear new, reflective of the
6: gap between practice and theory of machine learning. Training a
7: neural network is typically done via variations of the gradient descent
8: method. If a minimum of the loss function exists and gradient
9: descent is used as the training method, we provide an expression
10: that relates learning rate to the rate of convergence to the
11: minimum. The method involves an estimate of the condition number of
12: the Hessian of the loss function. We also discuss the existence of a
13: minimum, as it is not automatic that a minimum exists. One method of
14: ensuring convergence is by assigning positive probabiity to every class
15: in the training
16: dataset.
17: \end{abstract}
18: