1: \begin{abstract}
2: Large-batch SGD is important for scaling training of deep neural networks.
3: %
4: However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered.
5: %
6: %Recent theoretical insights and practical methods for large batch training allow certain models to be trained with little to no impact on final accuracy. Most works focus on modifying the learning rate schedule to mitigate this accuracy degradation.
7: %
8: We propose to use batch augmentation: replicating instances of samples within the same batch with different data augmentations. %, in order to boost the learning process.
9: %
10: Batch augmentation acts as a regularizer and an accelerator, increasing both generalization and performance scaling.
11: %
12: We analyze the effect of batch augmentation on gradient variance, and show that it empirically improves convergence for a wide variety of deep neural networks and datasets.
13: %
14: Our results show that batch augmentation reduces the number of necessary SGD updates to achieve the same accuracy as the state-of-the-art.
15: %
16: Overall, this simple yet effective method enables faster training and better generalization by allowing more computational resources to be used concurrently.
17:
18: %is a step towards shifting supervised learning from an inherently limited strongly scaling problem towards.
19: %Thus we claimed that if you have the capability to train with large batch than you should.
20:
21: %In addition, data augmentation techniques are known to improve the generalization capability of various models considerably.
22:
23: %We propose to combine these two insights into a simple yet effective method
24: %
25:
26: \end{abstract}
27: