abstract:9180cbf50034e98a.tex

1: \begin{abstract}

2:     Most methods for Personalized PageRank (PPR) precompute and store all

3:     \emph{accurate} PPR vectors, and at query time, return the ones of interest

4:     directly. However, the storage and computation of all accurate PPR vectors

5:     can be prohibitive for large graphs, especially in

6:     caching them in memory for real-time online querying.  In this paper, we propose a

7:     distributed framework that strikes a better balance between \emph{offline

8:     indexing} and \emph{online querying}.

9:     The offline indexing attains a fingerprint of the PPR vector of each

10:     vertex by performing billions of

11:     ``short'' random walks in parallel across a cluster of machines.  We prove

12:     that our indexing method has an

13:     exponential convergence, achieving the same precision with previous methods

14:     using a much smaller number of random walks.  At query time,

15:     the new PPR vector is composed by a linear

16:     combination of related fingerprints, in a highly efficient vertex-centric

17:     decomposition manner.  Interestingly, \emph{the resulting PPR

18:     vector is much more accurate than its offline counterpart because it

19:     actually uses more random walks in its estimation}. More importantly, we

20:     show that such decomposition for a batch of queries can be very efficiently

21:     processed using a shared decomposition. Our implementation,

22:     \emph{\sysname}, takes advantage of advanced distributed graph engines and

23:     it outperforms the state-of-the-art algorithms by orders of magnitude.

24:     Particularly, it

25:     responses to tens of thousands of queries on graphs with billions of edges

26:     in just a few seconds.

27: \end{abstract}

28: