1: \begin{abstract}
2: Deep neural networks have yielded superior performance in many
3: applications; however, the gradient computation in a deep model with millions of
4: instances leads to a lengthy training process even with modern
5: GPU/TPU hardware acceleration. In this paper, we propose \autoassist, a
6: simple framework to accelerate training of a deep neural network.
7: Typically, as the training procedure evolves, the amount of improvement in the current model by a
8: stochastic gradient update on each instance varies dynamically .
9: In \autoassist, we utilize this fact and design a simple {\em instance shrinking} operation, which is
10: used to filter out instances with relatively low marginal improvement to the
11: current model; thus the computationally intensive gradient computations are
12: performed on informative instances as much as possible. We prove that
13: the proposed technique outperforms vanilla SGD with existing importance
14: sampling approaches for linear SVM problems, and establish an $O(1/k)$
15: convergence for strongly convex problems. In order to apply the proposed
16: techniques to accelerate training of deep models, we propose to jointly train
17: a very lightweight \assistant network in addition to the original deep network referred to as \boss.
18: The \assistant network is designed to gauge the importance of a given
19: instance with respect to the {\em current} \boss such that a shrinking operation can
20: be applied in the batch generator. With careful design, we train the \boss and
21: \assistant in a nonblocking and asynchronous fashion such that
22: overhead is minimal. We demonstrate that
23: \autoassist reduces the number of epochs by $40\%$ for training a ResNet to reach the same test
24: accuracy on an image classification data set, and saves $30\%$ training time
25: needed for a transformer model to yield the same BLEU scores on a translation
26: dataset.
27: \end{abstract}
28: