1: \begin{abstract}
2: This paper revisits the special type of a neural network known under
3: two names. In the statistics and machine learning community it is
4: known as a multi-class logistic regression neural network. In the
5: neural network community, it is simply the soft-max layer. The
6: importance is underscored by its role in deep learning: as the last
7: layer, whose autput is actually the classification of the input
8: patterns, such as images. Our exposition focuses on mathematically
9: rigorous derivation of the key equation expressing the gradient. The
10: fringe benefit of our approach is a fully vectorized expression,
11: which is a basis of an efficient implementation. The second result
12: of this paper is the positivity of the second derivative of the
13: cross-entropy loss function as function of the weights. This result
14: proves that optimization methods based on convexity may be used to
15: train this network. As a corollary, we demonstrate that no
16: $L^2$-regularizer is needed to guarantee convergence of gradient
17: descent, provided that a global minimum of the loss function exists.
18: We also provide an effective bound on the rate of convergence for
19: two classes.
20: \end{abstract}
21: