abstract:6695d3683f72ef3a.tex

1: \begin{abstract}

2: Our objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest.

3: %We design a weighted random walk (WRW) algorithm that achieves high efficiency by preferentially crawling those nodes and edges that convey greater information regarding the target metric.

4: We design a random walk on an appropriately defined weighted graph

5: %% Comment: this is accurate and also captures both the definition of categories and setting up the weights

6: that achieves high efficiency by preferentially crawling those nodes and edges that convey greater information regarding the target metric.

7: Our approach begins by employing the theory of stratification

8: %and importance sampling

9: to find optimal node weights, for a given estimation problem, under an independence sampler. While optimal under independence sampling, these weights may be impractical %or even unachievable via

10: under graph crawling due to constraints arising from the structure of the graph.

11: Therefore, the edge weights for our random walk should be chosen so as to lead to an equilibrium distribution that strikes a balance between approximating the optimal weights under an independence sampler and achieving fast convergence.

12: We propose a heuristic approach (stratified weighted random walk, or S-WRW) that achieves this goal, while using only limited information about the graph structure and the node properties.

13: We evaluate our technique in simulation, and experimentally, by collecting a sample of Facebook college users.

14: We show that S-WRW requires \mbox{13-15} times fewer samples than the simple re-weighted random walk (RW) to achieve the same estimation accuracy for a range of metrics.

15:

16: \end{abstract}