bd94fb00fef903f7.tex
1: \begin{abstract}
2: The increasing size of neural network models has been critical for improvements in their accuracy, but device memory is not growing at the same rate. 
3: This creates fundamental challenges for training neural networks within limited memory environments. 
4: In this work, we propose \method, a memory-efficient training framework that stores randomly quantized activations for back propagation. 
5: We prove the convergence of \method for general network architectures, and we characterize the impact of quantization on the convergence via an exact expression for the gradient variance. 
6: Using our theory, we propose novel mixed-precision quantization strategies that exploit the activation's heterogeneity across feature dimensions, samples, and layers. 
7: These techniques can be readily applied to existing dynamic graph frameworks, such as PyTorch, simply by substituting the layers. 
8: We evaluate \method on mainstream computer vision models for classification, detection, and segmentation tasks. 
9: On all these tasks, \method compresses the activation to 2 bits on average, with negligible accuracy loss. 
10: \method reduces the memory footprint of the activation by 12$\times$, and it enables training with a $6.6 \times$ to $14 \times$ larger batch size.
11: We implement ActNN as a PyTorch library at \url{https://github.com/ucbrise/actnn}.
12: \end{abstract}
13: