abstract:c828aa263b4b40cd.tex

1: \begin{abstract}

2: % The abstract should briefly summarize the contents of the paper in

3: % 150--250 words.

4:

5: 	The ever-growing scale of deep neural networks (DNNs) has lead to an equally

6: 	rapid growth in computational resource requirements. Many recent

7: 	architectures, most prominently Large Language Models, have to be trained

8: 	using supercomputers with thousands of accelerators, such as GPUs or TPUs.

9: 	Next to the vast number of floating point operations the memory footprint of

10: 	DNNs is also exploding. In contrast, GPU architectures are notoriously short

11: 	on memory. Even comparatively small architectures like some

12: 	\emph{EfficientNet} variants cannot be trained on a single consumer-grade GPU

13: 	at reasonable mini-batch sizes. During training, intermediate input

14: 	activations have to be stored until backpropagation for gradient calculation.

15: 	These make up the vast majority of the memory footprint. In this work we

16: 	therefore consider compressing activation maps for the backward pass using

17: 	pooling, which can reduce both the memory footprint and amount of data

18: 	movement. The forward computation remains uncompressed. We empirically show

19: 	convergence and study effects on feature detection at the example of the

20: 	common vision architecture \emph{ResNet}. With this approach we are able to

21: 	reduce the peak memory consumption by 29\% at the cost of a longer training

22: 	schedule, while maintaining prediction accuracy compared to an uncompressed

23: 	baseline.

24:

25: \keywords{Compression, Deep Neural Networks, Training, Backpropagation}

26: \end{abstract}

27: