1: \begin{abstract}
2: We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space.
3: In traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities.
4: Consequently, the resulting rate is sublinear.
5: Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved.
6: In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function, which is somewhat inadequate for binary classification tasks.
7: In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions.
8: As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training.
9: In experiments, we verify our analyses on the $L_2$-regularized logistic regression.
10: \end{abstract}
11: