abstract:c0105ee7e6c922de.tex

1: \begin{abstract}

2:  Synchronous strategies with data parallelism, such as the Synchronous Stochastic Gradient Descent (S-SGD) and the model averaging methods, are widely utilized in distributed training of Deep Neural Networks (DNNs), largely owing to its easy implementation yet promising performance. Particularly, each worker of the cluster hosts a copy of the DNN and an evenly divided share of the dataset with the fixed mini-batch size, to keep the training of DNNs convergence. In the strategies, the workers with different computational capability, need to wait for each other because of the synchronization and delays in network transmission, which will inevitably result in the high-performance workers wasting computation. Consequently, the utilization of the cluster is relatively low. To alleviate this issue, we propose the Dynamic Batch Size (DBS) strategy for the distributed training of DNNs. Specifically, the performance of each worker is evaluated first based on the fact in the previous epoch, and then the batch size and dataset partition are dynamically adjusted in consideration of the current performance of the worker, thereby improving the utilization of the cluster. To verify the effectiveness of the proposed strategy, extensive experiments have been conducted, and the experimental results indicate that the proposed strategy can fully utilize the performance of the cluster, reduce the training time, and have good robustness with disturbance by irrelevant tasks. Furthermore, rigorous theoretical analysis has also been provided to prove the convergence of the proposed strategy.

3: \end{abstract}

4: