abstract:bd94fb00fef903f7.tex

1: \begin{abstract}

2: The increasing size of neural network models has been critical for improvements in their accuracy, but device memory is not growing at the same rate.

3: This creates fundamental challenges for training neural networks within limited memory environments.

4: In this work, we propose \method, a memory-efficient training framework that stores randomly quantized activations for back propagation.

5: We prove the convergence of \method for general network architectures, and we characterize the impact of quantization on the convergence via an exact expression for the gradient variance.

6: Using our theory, we propose novel mixed-precision quantization strategies that exploit the activation's heterogeneity across feature dimensions, samples, and layers.

7: These techniques can be readily applied to existing dynamic graph frameworks, such as PyTorch, simply by substituting the layers.

8: We evaluate \method on mainstream computer vision models for classification, detection, and segmentation tasks.

9: On all these tasks, \method compresses the activation to 2 bits on average, with negligible accuracy loss.

10: \method reduces the memory footprint of the activation by 12$\times$, and it enables training with a $6.6 \times$ to $14 \times$ larger batch size.

11: We implement ActNN as a PyTorch library at \url{https://github.com/ucbrise/actnn}.

12: \end{abstract}

13: