abstract:014c4a86e3b81855.tex

1: \begin{abstract}

2: Recently, several studies have proven the global convergence and generalization abilities of the gradient descent method for two-layer ReLU networks.

3: Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the {\it neural tangent kernel} has been pointed out.

4: On the other hand, the performance of gradient descent on classification problems using the logistic loss function has not been well studied, and further investigation of this problem structure is possible.

5: In this work, we demonstrate that the separability assumption using a {\it neural tangent} model is more reasonable than the positivity condition of the neural tangent kernel and provide a refined convergence analysis of the gradient descent for two-layer networks with smooth activations.

6: A remarkable point of our result is that our convergence and generalization bounds have much better dependence on the network width in comparison to related studies.

7: Consequently, our theory provides a generalization guarantee for less over-parameterized two-layer networks, while most studies require much higher over-parameterization.

8: \end{abstract}

9: