b886966436b5552d.tex
1: \begin{abstract} 
2: 	
3: We develop a scalable and extendable training framework that
4: can utilize GPUs across nodes in a
5: cluster and accelerate the training of deep learning
6: models based on data parallelism. Both synchronous
7: and asynchronous training are implemented in our
8: framework, where parameter exchange among
9: GPUs is based on CUDA-aware MPI. In
10: this report, we analyze the convergence
11: and capability of the framework to reduce training time when
12: scaling the synchronous training of AlexNet and
13: GoogLeNet from 2 GPUs to 8 GPUs. In addition, we explore
14: novel ways to reduce the communication overhead caused
15: by exchanging parameters. Finally, we release the framework as
16: open-source for further research on distributed deep learning\footnote{\url{https://github.com/uoguelph-mlrg/Theano-MPI}}.
17: 	
18: \end{abstract}