63fbcdd8557205de.tex
1: \begin{abstract}
2:     The increasing computational requirements of deep neural networks (DNNs) have led to significant interest in obtaining DNN models that are \emph{sparse}, yet \emph{accurate}. Recent work has investigated the even harder case of \emph{sparse training},
3:     where the DNN weights are, for as much as possible, already sparse to reduce computational costs during training.
4:     Existing sparse training methods are often empirical and can have lower accuracy relative to the dense baseline. In this paper, we present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of DNNs, demonstrate convergence for a variant of the algorithm, and  show that AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets; at high sparsity levels, AC/DC even outperforms existing methods that rely on accurate pre-trained dense models. An important property
5:     of AC/DC is that it allows \emph{co-training} of dense and sparse models, yielding accurate \emph{sparse--dense model pairs} at the end of the training process. This is useful in practice, where compressed variants may be desirable for deployment in resource-constrained settings without re-doing the entire training flow, and also provides us with insights into the accuracy gap between dense and compressed models. 
6:     The code is available at: \url{https://github.com/IST-DASLab/ACDC}.
7:     
8:     
9: \end{abstract}
10: