abstract:88db5569fbb76451.tex

1: \begin{abstract}

2:   This paper revisits the special type of a neural network known under

3:   two names. In the statistics and machine learning community it is

4:   known as a multi-class logistic regression neural network. In the

5:   neural network community, it is simply the soft-max layer. The

6:   importance is underscored by its role in deep learning: as the last

7:   layer, whose autput is actually the classification of the input

8:   patterns, such as images. Our exposition focuses on mathematically

9:   rigorous derivation of the key equation expressing the gradient. The

10:   fringe benefit of our approach is a fully vectorized expression,

11:   which is a basis of an efficient implementation. The second result

12:   of this paper is the positivity of the second derivative of the

13:   cross-entropy loss function as function of the weights. This result

14:   proves that optimization methods based on convexity may be used to

15:   train this network. As a corollary, we demonstrate that no

16:   $L^2$-regularizer is needed to guarantee convergence of gradient

17:   descent, provided that a global minimum of the loss function exists.

18:   We also provide an effective bound on the rate of convergence for

19:   two classes.

20: \end{abstract}

21: