1: \begin{abstract}
2: Recently, several studies have proven the global convergence and generalization abilities of the gradient descent method for two-layer ReLU networks.
3: Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the {\it neural tangent kernel} has been pointed out.
4: On the other hand, the performance of gradient descent on classification problems using the logistic loss function has not been well studied, and further investigation of this problem structure is possible.
5: In this work, we demonstrate that the separability assumption using a {\it neural tangent} model is more reasonable than the positivity condition of the neural tangent kernel and provide a refined convergence analysis of the gradient descent for two-layer networks with smooth activations.
6: A remarkable point of our result is that our convergence and generalization bounds have much better dependence on the network width in comparison to related studies.
7: Consequently, our theory provides a generalization guarantee for less over-parameterized two-layer networks, while most studies require much higher over-parameterization.
8: \end{abstract}
9: