2629937ad44179c0.tex
1: \begin{abstract}
2: Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. Frequent exchanges of node features, embeddings and embedding gradients (all referred to as \textit{messages}) across devices bring significant communication overhead for nodes with remote neighbors on other devices (\textit{marginal nodes}) and unnecessary waiting time for nodes without remote neighbors (\textit{central nodes}) in the graph. This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph GNN training. We stochastically quantize messages transferred across devices to lower-precision integers for communication traffic reduction and advocate communication-computation parallelization between marginal nodes and central nodes. We provide theoretical analysis to prove fast training convergence (at the rate of $O(T^{-1})$ with $T$ being the total number of training epochs) and design an adaptive quantization bit-width assignment scheme for each message based on the analysis, targeting a good trade-off between training convergence and efficiency. Extensive experiments on mainstream graph datasets show that AdaQP substantially improves %vanilla 
3: distributed full-graph training's throughput (up to 3.01$\times$) with negligible accuracy drop (at most 0.30\%) or even accuracy improvement (up to 0.19\%) in most cases, showing significant advantages over the state-of-the-art works. The code is available at \href{https://github.com/raywan-110/AdaQP}{https://github.com/raywan-110/AdaQP}.
4: \end{abstract}
5: