f7a4744c592b1a3f.tex
1: \begin{abstract}
2: % Graph neural network training methods can be categorized into full-batch training and mini-batch training.
3: Graph neural network training is mainly categorized into mini-batch and full-batch training methods.
4: The mini-batch training method samples subgraphs from the original graph in each iteration.
5: This sampling operation introduces extra computation overhead and reduces the training accuracy.
6: Meanwhile, the full-batch training method calculates the features and corresponding gradients of all vertices in each iteration, and therefore has higher convergence accuracy.
7: However, in the distributed cluster, frequent remote accesses of vertex features and gradients lead to huge communication overhead, thus restricting the overall training efficiency.
8: 
9: In this paper, we introduce the cached-based distributed full-batch graph neural network training framework (CDFGNN). 
10: We propose the adaptive cache mechanism to reduce the remote vertex access by caching the historical features and gradients of neighbor vertices.
11: Besides, we further optimize the communication overhead by quantifying the messages and designing the graph partition algorithm for the hierarchical communication architecture.
12: Experiments show that the adaptive cache mechanism reduces remote vertex accesses by $63.14\%$ on average.
13: Combined with communication quantization and hierarchical GP algorithm, CDFGNN outperforms the state-of-the-art distributed full-batch training frameworks by $30.39\%$ in our experiments.
14: Our results indicate that CDFGNN has great potential in accelerating distributed full-batch GNN training tasks.
15: 
16: 
17: 
18: 
19: 
20: % Furthermore, we further reduce the communication overhead by message quantization and designing the graph partitioning algorithm for the hierarchical communication architecture.
21: \end{abstract}
22: