abstract:ce7f7f572b3a3f6d.tex

1: \begin{abstract}

2: We propose {\rm \texttt{ResIST}}, a novel distributed training protocol for Residual Networks (ResNets).

3: {\rm \texttt{ResIST}} randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model.

4: In the next round, new sub-ResNets are randomly generated and the process repeats until convergence.

5: By construction, per iteration, {\rm \texttt{ResIST}} communicates only a small portion of network parameters to each machine and never uses the full model during training.

6: Thus, {\rm \texttt{ResIST}} reduces the per-iteration communication, memory, and time requirements of ResNet training to only a fraction of the requirements of full-model training.

7: In comparison to common protocols, like data-parallel training and data-parallel training with local SGD, {\rm \texttt{ResIST}} yields a decrease in communication and compute requirements, while being competitive with respect to model performance.

8: \end{abstract}

9: