abstract:078b3efa74d650fb.tex

1: \begin{abstract}

2: Deep neural networks have yielded superior performance in many

3: applications; however, the gradient computation in a deep model with millions of

4: instances leads to a lengthy training process even with modern

5: GPU/TPU hardware acceleration.  In this paper, we propose \autoassist, a

6: simple framework to accelerate training of a deep neural network.

7: Typically, as the training procedure evolves, the amount of improvement in the current model by a

8: stochastic gradient update on each instance varies dynamically .

9: In \autoassist, we utilize this fact and design a simple {\em instance shrinking} operation, which is

10: used to filter out instances with relatively low marginal improvement to the

11: current model; thus the computationally intensive gradient computations are

12: performed on informative instances as much as possible. We prove that

13: the proposed technique outperforms vanilla SGD with existing importance

14: sampling approaches for linear SVM problems, and establish an $O(1/k)$

15: convergence for strongly convex problems. In order to apply the proposed

16: techniques to accelerate training of deep models, we propose to jointly train

17: a very lightweight \assistant network in addition to the original deep network referred to as \boss.

18: The \assistant network is designed to gauge the importance of a given

19: instance with respect to the {\em current} \boss such that a shrinking operation can

20: be applied in the batch generator. With careful design, we train the \boss and

21: \assistant in a nonblocking and asynchronous fashion such that

22: overhead is minimal. We demonstrate that

23: \autoassist reduces the number of epochs by $40\%$ for training a ResNet to reach the same test

24: accuracy on an image classification data set, and saves $30\%$ training time

25: needed for a transformer model to yield the same BLEU scores on a translation

26: dataset.

27: \end{abstract}

28: