abstract:d7d5de4ae32c9433.tex

1: \begin{abstract}%

2: In theoretical analysis of deep learning,

3: discovering which features of deep learning lead to good performance is an important task.

4: In this paper,

5: using the framework for analyzing the generalization error developed in \cite{Suzuki18},

6: we derive a fast learning rate for deep neural networks with more general activation functions.

7: In \cite{Suzuki18},

8: assuming the scale invariance of activation functions,

9: the tight generalization error bound of deep learning was derived.

10: They mention that the scale invariance of the activation function is essential to derive tight error bounds.

11: %, and do not provide the generalization error bound without using the scale invariance.

12: Whereas the rectified linear unit (ReLU; \citealp{NairHinton10}) satisfies the scale invariance,

13: the other famous activation functions including the sigmoid and the hyperbolic tangent functions,

14: and the exponential linear unit (ELU; \citealp{ClevertEtAl16}) does not satisfy this condition.

15: The existing analysis indicates a possibility

16: that a deep learning with the non scale invariant activations may have a slower convergence rate of $O(1/\sqrt{n})$

17: when one with the scale invariant activations can reach a rate faster than $O(1/\sqrt{n})$.

18: In this paper, without the scale invariance of activation functions,

19: we derive the tight generalization error bound which is essentially the same as that of \cite{Suzuki18}.

20: From this result, at least in the framework of \cite{Suzuki18},

21: it is shown that the scale invariance of the activation functions is not essential to get the fast rate of convergence.

22: Simultaneously, it is also shown that

23: the theoretical framework proposed by \cite{Suzuki18} can be widely applied for analysis of deep learning with general activation functions.

24: \end{abstract}

25: