1: \begin{abstract}
2: Most methods for Personalized PageRank (PPR) precompute and store all
3: \emph{accurate} PPR vectors, and at query time, return the ones of interest
4: directly. However, the storage and computation of all accurate PPR vectors
5: can be prohibitive for large graphs, especially in
6: caching them in memory for real-time online querying. In this paper, we propose a
7: distributed framework that strikes a better balance between \emph{offline
8: indexing} and \emph{online querying}.
9: The offline indexing attains a fingerprint of the PPR vector of each
10: vertex by performing billions of
11: ``short'' random walks in parallel across a cluster of machines. We prove
12: that our indexing method has an
13: exponential convergence, achieving the same precision with previous methods
14: using a much smaller number of random walks. At query time,
15: the new PPR vector is composed by a linear
16: combination of related fingerprints, in a highly efficient vertex-centric
17: decomposition manner. Interestingly, \emph{the resulting PPR
18: vector is much more accurate than its offline counterpart because it
19: actually uses more random walks in its estimation}. More importantly, we
20: show that such decomposition for a batch of queries can be very efficiently
21: processed using a shared decomposition. Our implementation,
22: \emph{\sysname}, takes advantage of advanced distributed graph engines and
23: it outperforms the state-of-the-art algorithms by orders of magnitude.
24: Particularly, it
25: responses to tens of thousands of queries on graphs with billions of edges
26: in just a few seconds.
27: \end{abstract}
28: