98a2f9ad1aaa3d74.tex
1: \begin{abstract}
2: Gradient Boosting Decision Tree (GBDT) is an effective yet costly machine learning model. Current parallel GBDT algorithms generally follow a synchronous parallel design. Since the processing time for different nodes varies in practice, synchronisation in a parallel computing environment needs considerable time. In this paper, we propose an asynchronous parallel GBDT algorithm named as asynch-SGBDT. Our theoretical and experimental results indicate that compared with the serial GBDT training process, when the datasets are high-dimensional sparse datasets, asynch-SGBDT does not slow down convergence speed on the epoch. Asynch-SGBDT achieves 14x to 22x speedup when it uses 32 workers; LightGBM, as the benchmark, only achieves 5x to 7x speedup using 32 machines; Dimboost, as another benchmark, only achieves 4x to 5x speedup using 32 workers. All of theory and experimental results show that asynch-SGBDT is state-of-the-art parallel GBDT algorithm.
3: \end{abstract}
4: