abstract:27cc4b8854300559.tex

1: \begin{abstract}

2:   We present a theoretical framework recasting data augmentation as stochastic

3:   optimization for a sequence of time-varying proxy losses. This provides a unified approach

4:   to understanding techniques commonly thought of as data augmentation, including

5:   synthetic noise and label-preserving transformations, as well as more traditional

6:   ideas in stochastic optimization such as learning rate and batch size scheduling.

7:   We prove a time-varying Monro-Robbins theorem with rates of convergence which gives

8:   conditions on the learning rate and augmentation schedule under which augmented

9:   gradient descent converges. Special cases give provably good joint schedules

10:   for augmentation with additive noise, minibatch SGD, and minibatch SGD with noise.

11: \end{abstract}

12: