9180cbf50034e98a.tex
1: \begin{abstract}
2:     Most methods for Personalized PageRank (PPR) precompute and store all
3:     \emph{accurate} PPR vectors, and at query time, return the ones of interest
4:     directly. However, the storage and computation of all accurate PPR vectors
5:     can be prohibitive for large graphs, especially in
6:     caching them in memory for real-time online querying.  In this paper, we propose a
7:     distributed framework that strikes a better balance between \emph{offline
8:     indexing} and \emph{online querying}.
9:     The offline indexing attains a fingerprint of the PPR vector of each
10:     vertex by performing billions of
11:     ``short'' random walks in parallel across a cluster of machines.  We prove
12:     that our indexing method has an
13:     exponential convergence, achieving the same precision with previous methods
14:     using a much smaller number of random walks.  At query time,
15:     the new PPR vector is composed by a linear
16:     combination of related fingerprints, in a highly efficient vertex-centric
17:     decomposition manner.  Interestingly, \emph{the resulting PPR
18:     vector is much more accurate than its offline counterpart because it
19:     actually uses more random walks in its estimation}. More importantly, we
20:     show that such decomposition for a batch of queries can be very efficiently
21:     processed using a shared decomposition. Our implementation,
22:     \emph{\sysname}, takes advantage of advanced distributed graph engines and
23:     it outperforms the state-of-the-art algorithms by orders of magnitude.
24:     Particularly, it
25:     responses to tens of thousands of queries on graphs with billions of edges
26:     in just a few seconds.
27: \end{abstract}
28: