abstract:7ce7f1dba421fde6.tex

1: \begin{abstract}

2:     Self-training, a semi-supervised learning (SemiSL) algorithm, improves the test accuracy by taking advantage of unlabeled data especially when the labeled data is not abundant. In spite of numerical success, the roles of unlabeled data in improving the training performance have not been fully understood yet.

3:     % Inspired by recent developments of techniques by generalization analysis of supervised learning in supervised learning problems,

4:     In this paper, we  explore the performance of classic iterative self-training method when the labeled data amount is inadequate.

5:     Assuming the input data belongs to Gaussian distribution, we prove that sufficiently large size of unlabeled data improves the generalization error as well as the convergence rate, and the theoretical results characterize the advancements with different number of unlabeled data quantitatively as well. Our proofs are built upon studying a neural network of one-hidden-layer, while the conclusions are verified through both synthetic and real data.

6: \end{abstract}

7: