abstract:761b218b5dcd2062.tex

1: \begin{abstract}

2:

3: \noindent

4: Presently the most successful approaches to semi-supervised

5: learning are based on \emph{consistency regularization}, whereby a model is

6: trained to be robust to small perturbations of its inputs and parameters. To

7: understand consistency regularization, we conceptually explore how loss geometry

8: interacts with training procedures.

9: The consistency loss dramatically improves generalization performance over

10: supervised-only training;

11: however, we show that SGD struggles to converge on the consistency loss and continues to make large steps that lead to

12:  changes in predictions on the test data.

13: Motivated by these observations, we propose to train consistency-based methods

14: with Stochastic Weight Averaging (SWA), a recent approach which averages weights

15: along the trajectory of SGD with a modified learning rate schedule.

16: We also propose \emph{fast-SWA},

17: which further accelerates convergence by averaging multiple points within each

18: cycle of a cyclical learning rate schedule.

19: With weight averaging, we achieve the

20: best known semi-supervised results on CIFAR-10 and CIFAR-100, over many different quantities of

21: labeled training data. For example, we achieve 5.0\% error on CIFAR-10 with only 4000 labels,

22: compared to the previous best result in the literature of 6.3\%.

23:

24:

25: \end{abstract}

26: