abstract:ff414bb1888d19f9.tex

1: \begin{abstract}

2: We derive the fast convergence rates of a deep neural network (DNN) classifier with the

3: rectified linear unit (ReLU) activation function learned using the hinge loss. We consider three cases for a true model:

4: (1) a smooth decision boundary, (2) smooth conditional class probability, and (3) the margin condition

5: (i.e., the probability of inputs near the decision boundary is small). We show that

6: the DNN classifier learned using the hinge loss achieves fast rate convergences for all three

7: cases provided that the architecture (i.e., the number of layers, number of nodes and sparsity).

8: is carefully selected. An important implication is that DNN architectures are very flexible

9: for use in various cases without much modification. In addition, we consider a DNN

10: classifier learned by minimizing the cross-entropy, and show that

11: the DNN classifier achieves a fast convergence rate under the condition that

12: the conditional class probabilities of most data are sufficiently close to either 1 or zero.

13: This assumption is not unusual for image recognition because human beings are  extremely good at

14: recognizing most images. To confirm our theoretical explanation, we present the results of a small numerical study conducted to compare the hinge loss and cross-entropy.

15:

16:

17: \noindent

18:  Keywords: Classification, Deep neural network, Excess risk, Fast convergence rate

19:   \end{abstract}

20: