abstract:b886966436b5552d.tex

1: \begin{abstract}

2:

3: We develop a scalable and extendable training framework that

4: can utilize GPUs across nodes in a

5: cluster and accelerate the training of deep learning

6: models based on data parallelism. Both synchronous

7: and asynchronous training are implemented in our

8: framework, where parameter exchange among

9: GPUs is based on CUDA-aware MPI. In

10: this report, we analyze the convergence

11: and capability of the framework to reduce training time when

12: scaling the synchronous training of AlexNet and

13: GoogLeNet from 2 GPUs to 8 GPUs. In addition, we explore

14: novel ways to reduce the communication overhead caused

15: by exchanging parameters. Finally, we release the framework as

16: open-source for further research on distributed deep learning\footnote{\url{https://github.com/uoguelph-mlrg/Theano-MPI}}.

17:

18: \end{abstract}