ff414bb1888d19f9.tex
1: \begin{abstract}
2: We derive the fast convergence rates of a deep neural network (DNN) classifier with the
3: rectified linear unit (ReLU) activation function learned using the hinge loss. We consider three cases for a true model:
4: (1) a smooth decision boundary, (2) smooth conditional class probability, and (3) the margin condition
5: (i.e., the probability of inputs near the decision boundary is small). We show that
6: the DNN classifier learned using the hinge loss achieves fast rate convergences for all three
7: cases provided that the architecture (i.e., the number of layers, number of nodes and sparsity).
8: is carefully selected. An important implication is that DNN architectures are very flexible 
9: for use in various cases without much modification. In addition, we consider a DNN
10: classifier learned by minimizing the cross-entropy, and show that
11: the DNN classifier achieves a fast convergence rate under the condition that
12: the conditional class probabilities of most data are sufficiently close to either 1 or zero.
13: This assumption is not unusual for image recognition because human beings are  extremely good at 
14: recognizing most images. To confirm our theoretical explanation, we present the results of a small numerical study conducted to compare the hinge loss and cross-entropy.
15: 
16: 
17: \noindent
18:  Keywords: Classification, Deep neural network, Excess risk, Fast convergence rate 
19:   \end{abstract}
20: