1: \begin{abstract}
2: We reduce the cost of communication and synchronization in graph processing
3: by analyzing the fastest way to process graphs:
4: pushing the updates to a shared state or pulling the updates to
5: a private state.
6: We investigate the applicability of this push-pull dichotomy to various algorithms
7: and its impact on complexity, performance, and the amount of used locks, atomics, and reads/writes.
8: We consider 11 graph algorithms, 3 programming
9: models, 2 graph abstractions, and various families
10: of graphs.
11: The conducted analysis illustrates surprising differences between push and pull
12: variants of different algorithms in performance, speed of convergence, and code
13: complexity; the insights are backed up by performance data from
14: hardware counters. We use these findings to illustrate which variant is faster
15: for each algorithm and to develop generic strategies that enable even
16: higher speedups.
17: Our insights can be used to accelerate graph processing engines or libraries on
18: both massively-parallel shared-memory machines as well as distributed-memory
19: systems.
20: \end{abstract}
21: