3420159aebaf4b08.tex
1: \begin{abstract}
2: %-------------------------------------------------------------------------------
3: It is a challenging task to train large DNN models on sophisticated
4: GPU platforms with diversified interconnect capabilities.
5: Recently, pipelined training has been proposed
6: as an effective approach for improving device utilization.
7: However, there are still several tricky issues to address: 
8: improving computing efficiency while ensuring convergence, and
9: reducing memory usage without incurring additional computing costs.
10: We propose \emph{DAPPLE}, a synchronous training framework which combines
11: data parallelism and pipeline parallelism for large DNN models.
12: It features a novel parallelization strategy \emph{planner} %for synchronous training(friendly for model convergence)
13: to solve the partition and placement problems, and explores the optimal hybrid strategies of data and pipeline parallelism.
14: We also propose a new runtime scheduling algorithm to reduce device
15: memory usage, which is orthogonal to re-computation approach and does not come
16: at the expense of training throughput.
17: Experiments show that \emph{DAPPLE planner} consistently outperforms strategies generated by PipeDream‘s planner by up to $3.23\times$ speedup under synchronous training scenarios, and \emph{DAPPLE runtime} outperforms GPipe by $1.6\times$ speedup of training throughput and saves 12\% of memory consumption at the same time.
18: %given a fixed global batch size, 
19: % \emph{DAPPLE} outperforms the best data parallelism baselines with 1.71X/1.37X/1.79X % (up to 2.32X for GNMT-16% on \cb{config $C$})
20: % training speedups on three typical cluster environments.
21: % Note: these number is calculated when **GBS=128** for all models
22: 
23: \end{abstract}
24: