abstract:941232f544ae74cd.tex

1: \begin{abstract}

2: Communication cost is the main bottleneck for the design of effective distributed learning algorithms. Recently, event-triggered techniques have been proposed to reduce the exchanged information among compute nodes and thus alleviate the communication cost. However, most existing event-triggered approaches only consider heuristic event-triggered thresholds. They also ignore the impact of computation and network delay, which play an important role on the training performance. In this paper, we propose an Asynchronous Event-triggered Stochastic Gradient Descent (SGD) framework, called \texttt{AET-SGD}, to i) reduce the communication cost among the compute nodes, and ii) mitigate the impact of the delay.

3: % , and iii) support both static and dynamic network topology.

4: Compared with baseline event-triggered methods, \texttt{AET-SGD} employs a linear increasing sample size event-triggered threshold, and can significantly reduce the communication cost while keeping good convergence performance.

5: We implement \texttt{AET-SGD} and evaluate its performance on multiple representative data sets, including MNIST, FashionMNIST, KMNIST and CIFAR$10$. The experimental results validate the correctness of the design and show a significant communication cost reduction from $44$x to $120$x, compared to the state of the art.

6: % Our results also show that \texttt{AET-SGD} can achieve the same level of test accuracy on both static and dynamic network topology.

7: Our results also show that \texttt{AET-SGD} can resist large delay from the straggler nodes while obtaining a decent performance and a desired speedup ratio.

8: \end{abstract}

9: