abstract:f7a4744c592b1a3f.tex

1: \begin{abstract}

2: % Graph neural network training methods can be categorized into full-batch training and mini-batch training.

3: Graph neural network training is mainly categorized into mini-batch and full-batch training methods.

4: The mini-batch training method samples subgraphs from the original graph in each iteration.

5: This sampling operation introduces extra computation overhead and reduces the training accuracy.

6: Meanwhile, the full-batch training method calculates the features and corresponding gradients of all vertices in each iteration, and therefore has higher convergence accuracy.

7: However, in the distributed cluster, frequent remote accesses of vertex features and gradients lead to huge communication overhead, thus restricting the overall training efficiency.

8:

9: In this paper, we introduce the cached-based distributed full-batch graph neural network training framework (CDFGNN).

10: We propose the adaptive cache mechanism to reduce the remote vertex access by caching the historical features and gradients of neighbor vertices.

11: Besides, we further optimize the communication overhead by quantifying the messages and designing the graph partition algorithm for the hierarchical communication architecture.

12: Experiments show that the adaptive cache mechanism reduces remote vertex accesses by $63.14\%$ on average.

13: Combined with communication quantization and hierarchical GP algorithm, CDFGNN outperforms the state-of-the-art distributed full-batch training frameworks by $30.39\%$ in our experiments.

14: Our results indicate that CDFGNN has great potential in accelerating distributed full-batch GNN training tasks.

15:

16:

17:

18:

19:

20: % Furthermore, we further reduce the communication overhead by message quantization and designing the graph partitioning algorithm for the hierarchical communication architecture.

21: \end{abstract}

22: