761b218b5dcd2062.tex
1: \begin{abstract}
2: 
3: \noindent
4: Presently the most successful approaches to semi-supervised
5: learning are based on \emph{consistency regularization}, whereby a model is
6: trained to be robust to small perturbations of its inputs and parameters. To 
7: understand consistency regularization, we conceptually explore how loss geometry
8: interacts with training procedures.   
9: The consistency loss dramatically improves generalization performance over
10: supervised-only training;
11: however, we show that SGD struggles to converge on the consistency loss and continues to make large steps that lead to 
12:  changes in predictions on the test data.
13: Motivated by these observations, we propose to train consistency-based methods 
14: with Stochastic Weight Averaging (SWA), a recent approach which averages weights
15: along the trajectory of SGD with a modified learning rate schedule.
16: We also propose \emph{fast-SWA},
17: which further accelerates convergence by averaging multiple points within each
18: cycle of a cyclical learning rate schedule.
19: With weight averaging, we achieve the 
20: best known semi-supervised results on CIFAR-10 and CIFAR-100, over many different quantities of
21: labeled training data. For example, we achieve 5.0\% error on CIFAR-10 with only 4000 labels,
22: compared to the previous best result in the literature of 6.3\%.
23: 
24: 
25: \end{abstract}
26: