e6ab1b74adab50af.tex
1: \begin{abstract}
2: Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it.
3: Still, it has been used primarily for inference---not training.
4: Previous low-precision training algorithms suffered from a fundamental tradeoff: as the number of bits of precision is lowered, quantization noise is added to the model, which limits statistical accuracy.
5: To address this issue, we describe a simple low-precision stochastic gradient descent variant called \sysname{}.
6: \sysname{} converges at the same theoretical rate as full-precision algorithms despite the noise introduced by using low precision throughout execution.
7: The key idea is to use SVRG to reduce gradient variance, and to combine this with a novel technique called \emph{bit centering} to reduce quantization error.
8: We show that on the CPU, \sysname{} can run up to $\numb{4} \times$ faster than full-precision SVRG and can match its convergence trajectory.
9: We implemented \sysname{} in TensorQuant, and show that it exceeds the validation performance of plain low-precision SGD on two deep learning tasks.
10: \end{abstract}
11: