abstract:dbeb41661f13b4c6.tex

1: \begin{abstract}

2: We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space.

3: In traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities.

4: Consequently, the resulting rate is sublinear.

5: Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved.

6: In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function, which is somewhat inadequate for binary classification tasks.

7: In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions.

8: As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training.

9: In experiments, we verify our analyses on the $L_2$-regularized logistic regression.

10: \end{abstract}

11: