d7d5de4ae32c9433.tex
1: \begin{abstract}%
2: In theoretical analysis of deep learning, 
3: discovering which features of deep learning lead to good performance is an important task.
4: In this paper, 
5: using the framework for analyzing the generalization error developed in \cite{Suzuki18}, 
6: we derive a fast learning rate for deep neural networks with more general activation functions.
7: In \cite{Suzuki18}, 
8: assuming the scale invariance of activation functions, 
9: the tight generalization error bound of deep learning was derived. 
10: They mention that the scale invariance of the activation function is essential to derive tight error bounds.
11: %, and do not provide the generalization error bound without using the scale invariance.
12: Whereas the rectified linear unit (ReLU; \citealp{NairHinton10}) satisfies the scale invariance, 
13: the other famous activation functions including the sigmoid and the hyperbolic tangent functions,
14: and the exponential linear unit (ELU; \citealp{ClevertEtAl16}) does not satisfy this condition.
15: The existing analysis indicates a possibility
16: that a deep learning with the non scale invariant activations may have a slower convergence rate of $O(1/\sqrt{n})$
17: when one with the scale invariant activations can reach a rate faster than $O(1/\sqrt{n})$.
18: In this paper, without the scale invariance of activation functions, 
19: we derive the tight generalization error bound which is essentially the same as that of \cite{Suzuki18}.
20: From this result, at least in the framework of \cite{Suzuki18}, 
21: it is shown that the scale invariance of the activation functions is not essential to get the fast rate of convergence. 
22: Simultaneously, it is also shown that
23: the theoretical framework proposed by \cite{Suzuki18} can be widely applied for analysis of deep learning with general activation functions.
24: \end{abstract}
25: